Methods and Systems for Storage Virtual Machine Migration Between Clusters of a Networked Storage System

ABSTRACT

Methods and systems for Vserver migration are provided. Update after claims are finalized. One method includes generating a consistency group (CG) having a plurality of source storage volumes of a source storage virtual machine (Vserver) of a source cluster for a migrate operation to migrate the source storage volumes as a group to a plurality of destination storage volumes of a destination cluster; establishing a mirroring relationship between the source and destination cluster for managing asynchronous transfer of the source storage volumes in the CG to the destination storage volumes during a transfer phase of the migrate operation; replicating a logical interface of the source cluster to the destination cluster, the logical interface providing a network address to access the source cluster; and automatically selecting a destination port at the destination cluster, associated with the replicated logical interface.

CROSS-REFERENCE TO RELATED APPLICATIONS

This patent application claims priority under 35 U.S.C. 119(a) to theProvisional Indian Patent Application, Serial No. 202141049497, entitled“METHODS AND SYSTEMS FOR STORAGE VIRTUAL MACHINE MIGRATION BETWEENCLUSTERS OF A NETWORKED STORAGE SYSTEM”, filed on Oct. 29, 2021, thedisclosure of which is incorporated herein by reference in its entirety.

Technical Field

The present disclosure relates to storage systems and more particularly,to storage virtual machine (also referred to as a “Vserver”)) migrationfrom a source cluster to a destination cluster of a networked storageenvironment.

Background

Various forms of storage systems are used today. These forms includedirect attached storage, network attached storage (NAS) systems, storagearea networks (SANs), and others. Storage systems are commonly used fora variety of purposes, such as providing multiple users with access toshared data, backing up data and others.

A storage system typically includes at least one computing system (mayalso be referred to as a “server” or “storage server”), which is acomputer processing system configured to store and retrieve data onbehalf of one or more client computing systems (“clients”). The storagesystem may be presented to a client system as a virtual storage system(also interchangeably referred to as a storage virtual machine (“SVM”)or “Vserver” throughout this specification) with storage space forstoring information. The Vserver is associated with a physical storagesystem but operates as an independent system for handling clientinput/output (I/O) requests.

A Vserver may be migrated from one source cluster to a destinationcluster. The term cluster in this sense means a configuration thatincludes a plurality of nodes/modules (e.g., network modules and storagemodules) to enable access to networked storage. It is desirable toefficiently complete a migration operation from the source cluster tothe destination cluster with minimal disruption to client computingsystems that use the Vserver to store and retrieve data. Continuousefforts are being made to develop technology for efficiently migrating aVserver from one cluster to another.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing features and other features will now be described withreference to the drawings of the various aspects. In the drawings, thesame components have the same reference numerals. The illustratedaspects are intended to illustrate, but not to limit the presentdisclosure. The drawings include the following Figures:

FIG. 1 shows an example of a storage environment, used according to oneaspect of the present disclosure;

FIG. 2 shows a block diagram of a cluster-based storage system in anetworked storage environment, used according to one aspect of thepresent disclosure;

FIG. 3A shows an example of a node used in a cluster-based storagesystem, used according to one aspect of the present disclosure;

FIG. 3B shows migration of a source Vserver from a source cluster to adestination cluster, according to one aspect of the present disclosure;

FIG. 3C shows a high-level block diagram of an architecture of a systemfor migrating the source Vserver, according to one aspect of the presentdisclosure;

FIG. 4A shows a detailed block level diagram of a system for migratingthe source Vserver, according to one aspect of the present disclosure;

FIG. 4B shows another block level diagram of a system for migrating thesource Vserver, according to one aspect of the present disclosure;

FIG. 5A shows a setup phase of a migrate operation to migrate the sourceVserver to the destination cluster, according to one aspect of thepresent disclosure;

FIG. 5B shows a transfer phase of the migrate operation to migrate thesource Vserver to the destination cluster, according to one aspect ofthe present disclosure;

FIG. 5C shows a pre-commit stage of a cut-over phase of the migrateoperation to migrate the source Vserver to the destination cluster,according to one aspect of the present disclosure;

FIG. 5D shows a commit stage of the cut-over phase of the migrateoperation to migrate the source Vserver to the destination cluster,according to one aspect of the present disclosure;

FIG. 5E shows a post commit phase of the migrate operation to migratethe source Vserver to the destination cluster, according to one aspectof the present disclosure;

FIG. 5F shows a post cut-over phase and a final clean up phase of themigrate operation to migrate the source Vserver to the destinationcluster, according to one aspect of the present disclosure;

FIG. 6 shows a state diagram for the migrate operation to migrate thesource Vserver to the destination cluster, according to one aspect ofthe present disclosure;

FIG. 7A shows a pause phase of the migrate operation to migrate thesource Vserver to the destination cluster, according to one aspect ofthe present disclosure;

FIG. 7B shows a process for handling cloud backup during the migrateoperation to migrate the source Vserver to the destination cluster,according to one aspect of the present disclosure;

FIG. 7C shows a process for volume placement of the migrate operation tomigrate the source Vserver to the destination cluster, according to oneaspect of the present disclosure;

FIG. 7D shows logical interface (“LIF”) placement for the migrateoperation to migrate the source Vserver to the destination cluster,according to one aspect of the present disclosure;

FIG. 7E shows a process flow for failure handling of the migrateoperation to migrate the source Vserver to the destination cluster,according to one aspect of the present disclosure;

FIG. 7F shows another process flow for failure handling of the migrateoperation to migrate the source Vserver to the destination cluster,according to one aspect of the present disclosure;

FIG. 8 shows a block diagram of a storage operating system, usedaccording to one aspect of the present disclosure; and

FIG. 9 shows an example of a processing system used according to oneaspect of the present disclosure.

DETAILED DESCRIPTION

In one aspect, innovative technology is provided to migrate a Vserver(also referred to as a storage virtual machine (“SVM”), or a virtualstorage system) from a source cluster to a destination cluster of anetworked storage system. Vservers are typically used in a storagecluster architecture, described below. Typically, a data center may usemultiple clusters. A Vserver is a data container in a clustered storagesystem that enables access to storage. It is desirable to move a Vserverfrom one cluster to another with minimal or non-disruption.Non-disruption in this context means a maximum acceptable duration whena client application executed by a client computing system does notreceive a response from the networked storage system. The innovativetechnology disclosed herein enables efficient transfer of Vserverconfiguration information along with constituent data volumes that storeapplication data and volume metadata from the source cluster to thedestination cluster. From a client system's perspective there is nodisruption to data access.

In one aspect, the Vserver migration process includes various phases,including a setup phase, a transfer phase, a cutover commit phase, postcutover phase and a final cleanup phase, described below in detail. Thevarious aspects of the present disclosure include at least the followinginnovative features of the various phases of a migrate operation:

Setup Phase: Group Control (by a storage module (e.g., 216, FIG. 2 )):During this phase, a group is created of the volumes belonging to asource Vserver (e.g., 320, FIG. 3C) in the storage module. Group controlexists close to a data transfer engine (e.g., 348/349, FIG. 3C) in thestorage module, which allows for efficient interaction between a controlplane (e.g., 338, FIG. 3C) and a data transfer engine in a data plane(e.g., 340, FIG. 3C) that transfers data to a destination cluster (e.g.,328, FIG. 3B).

Orchestration (Failure Handling): Separate master processes (e.g., 342and 404, FIG. 4A) are executed in the source cluster (e.g., 326, FIG.3C) and destination cluster (e.g., 328, FIG. 4A) to handle failurescenarios. Recovery is based on idempotent principle (implemented by allcomponents). Types of failure handling includes—cluster failure, nodefailure, process failure, network port failure, network partitioning andothers, as described below in detail.

LIF (Logical Interface) or volume placement includes granular volume andaggregate placement that supports volume to aggregate maps at thedestination cluster. Volume placement is based on properties includingcapacity, storage tiers, user preference and others as described belowin detail. LIF placement ensures volume affinity to avoid cross-nodetraffic after migrating; and source volume configuration is preserved.

Transfer Phase: During this phase, data transfers are performed using anasynchronous transfer engine (e.g., 348, FIG. 4A); and a migrationoperation can be “paused” for additional control within a migrate outagewindow, described below in detail.

Cutover Pre-commit Phase: During this phase, relationships aretransferred to a synchronous engine to ensure a short cutover window.NFS (Network File System) delegations are revoked to prepare forcutover.

Cutover Commit Phase: During this phase, locking mechanism (PONR (Pointof no return) technique) is used to avoid split-brain scenarios and apersistent state at replicated databases (e.g., 432A/432B, FIG. 4A) ismaintained on both the source and destination clusters. PONR in thecutover phase means that the source Vserver cannot be accessed from thesource cluster, as described below in detail. A snapshot can be taken atthe point of cutover for data integrity check after cutover. The termsnapshot in this context means a point-in-time copy that captures allthe information in a storage volume.

Post Cutover Phase: During this phase, the last volume configuration isfetched and applied on the destination cluster.

Cleanup Phase: During this phase, a source Vserver is preserved for dataintegrity before deletion. This allows for the source Vserver to bebrought back as a primary Vserver, if there is a failure, as describedbelow in detail.

As a preliminary note, as used in this disclosure, the terms “component”“module”, “system,” and the like are intended to refer to acomputer-related entity, either software-executing general-purposeprocessor, hardware, firmware, and a combination thereof. For example, acomponent may be, but is not limited to being, a process running on aprocessor, a processor, an object, an executable, a thread of execution,a program, and/or a computer. By way of illustration, both anapplication running on a server and the server can be a component. Oneor more components may reside within a process and/or thread ofexecution and a component may be localized on one computer and/ordistributed between two or more computers. Also, these components canexecute from various computer readable media having various datastructures stored thereon.

The components may communicate via local and/or remote processes such asin accordance with a signal having one or more data packets (e.g., datafrom one component interacting with another component in a local system,distributed system, and/or across a network such as the Internet withother systems via the signal). Computer executable components can bestored, for example, on non-transitory computer readable mediaincluding, but not limited to, an ASIC (application specific integratedcircuit), CD (compact disc), DVD (digital video disk), ROM (read onlymemory), flash memory, hard disk, EEPROM (electrically erasableprogrammable read only memory), or any other storage device, inaccordance with the claimed subject matter.

Storage Environment 100: FIG. 1 shows an example of a networkedoperating environment 100 (also referred to as system 100) usedaccording to various aspects of the present disclosure. As an example,system 100 may include a plurality of storage systems 120A-120N (mayalso be referred to as storage server/storage servers/storagecontroller/storage controllers 120, and also referred to as an“on-premise” storage system 120) executing a storage operating system124A-124N (may also be referred to as storage operating system 124 orstorage operating systems 124). In one aspect, the storage system 120(or a cloud storage OS 140, described below in detail) can be organizedinto any suitable number of Vservers, in which each Vserver represents asingle storage system namespace with a separate network access. EachVserver has a specific client domain and a security domain that areseparate from a client system and a security domain of other Vservers.Moreover, each Vserver can span one or more physical nodes, each ofwhich can hold storage associated with one or more Vservers.

Each Vserver is addressable by client systems and handles input/output(also referred to as “I/O” or “IO”) commands, just like storage system120. Each Vserver is associated with a physical storage system (e.g., astorage sub-system 116). Each Vserver is assigned a unique accessaddress that is used by a client computing system to access the storagesystem 120. For example, each Vserver is assigned an Internet Protocol(IP) address (also referred to as a LIF) that is used by a client systemto send I/O commands. The IP address from an IP address space may beassigned when the Vserver is configured using a management module 134executed by a management system 132.

System 100 also includes a plurality of computing systems 102A-102N(shown as host 102, 102A-102N and may also be referred to as a “hostsystem 102”, “host systems 102”, “server 102” or “servers 102”) and usersystems 108A-108N (may also be referred to as “user system 108,” “usersystems 108,” “client system 108” or “client systems 108”) that mayaccess storage space provided by a cloud layer 136 and/or thestorage-subsystem 116 managed by the storage systems 120 (or Vservers)via a connection system 118 such as a local area network (LAN), widearea network (WAN), the Internet and others. The storage-subsystem 116includes a plurality of storage devices 114A-114N (may also be referredto as storage device/storage devices/disk/disks 114). It is noteworthythat the term “disk” as used herein is intended to mean any storagedevice/space and not to limit the adaptive aspects to any particulartype of storage device, for example, hard disks.

In one aspect, the storage system 120 uses the storage operating system124 to store and retrieve data from the storage sub-system 116 byaccessing the storage devices 114. Data is stored and accessed usingread and write requests that are also referred to as input/output (I/O)requests. The storage devices 114 may be organized as one or more RAIDgroups. The various aspects disclosed herein are not limited to anystorage device type or storage device configuration.

In one aspect, system 100 includes the cloud layer 136 having a cloudstorage manager (may also be referred to as “cloud manager”) 122, and acloud storage operating system (may also be referred to as “CloudStorage OS”) 140 having access to cloud storage 128. The cloud storagemanager 122 enables configuration and management of storage resources.

The system and techniques described above are applicable and especiallyuseful in the cloud computing environment where storage is presented andshared across different platforms. Cloud computing means computingcapability that provides an abstraction between the computing resourceand its underlying technical architecture (e.g., servers, storage,networks), enabling convenient, on-demand network access to a sharedpool of configurable computing resources that may be rapidly provisionedand released with minimal management effort or service providerinteraction. The term “cloud” is intended to refer to a network, forexample, the Internet and cloud computing allows shared resources, forexample, software and information to be available, on-demand, like apublic utility.

Typical cloud computing providers deliver common business applicationsonline which are accessed from another web service or software like aweb browser, while the software and data are stored remotely on servers.The cloud computing architecture uses a layered approach for providingapplication services. A first layer is an application layer that isexecuted at client computers. In this example, the application allows aclient to access storage via a cloud. After the application layer is acloud platform and cloud infrastructure, followed by a “server” layerthat includes hardware and computer software designed for cloud specificservices.

As an example, a cloud provider 104, provides access to the cloud layer136 and its components via a communication interface 112. A non-limitingexample of the cloud layer 136 is a cloud platform, e.g., Amazon WebServices (“AWS”) provided by Amazon Inc., Azure provided by MicrosoftCorporation, Google Cloud Platform provided by Alphabet Inc. (withoutderogation of any trademark rights of Amazon Inc., Microsoft Corporationor Alphabet Inc.), or any other cloud platform. In one aspect,communication interface 112 includes hardware, circuitry, logic andfirmware to receive and transmit information using one or moreprotocols. As an example, the cloud layer 136 can be configured as avirtual private cloud (VPC), a logically isolated section of a cloudinfrastructure that simulates an on-premises data center with theon-premise, storage system 120.

In one aspect, the cloud manager 122 is provided as a softwareapplication running on a computing device or within a virtual machine(“VM”) for configuring, protecting and managing storage objects. In oneaspect, the cloud manager 122 enables access to a storage service (e.g.,backup, restore, cloning or any other storage related service) from a“micro-service” made available from the cloud layer 136. In one aspect,the cloud manager 122 stores user information including a useridentifier, a network domain for a user device, a user accountidentifier, or any other information to enable access to storage fromthe cloud layer 136.

Software applications for cloud-based systems are typically built using“containers,” which may also be referred to as micro-services.Kubernetes is an open-source software platform for deploying, managingand scaling containers including the cloud storage OS 140, and the cloudmanager 122. Azure is a cloud computing platform provided by MicrosoftCorporation (without derogation of any third-party trademark rights) forbuilding, testing, deploying, and managing applications and servicesincluding the cloud storage OS 140, and cloud manager 122. AzureKubernetes Service enables deployment of a production ready Kubernetescluster in the Azure cloud for executing the cloud storage OS 140, andthe cloud manager 122. It is noteworthy that the adaptive aspects of thepresent disclosure are not limited to any specific cloud platform.

The term micro-service as used herein denotes computing technology forproviding a specific functionality in system 100 via the cloud layer136. As an example, the cloud storage OS 140, and the cloud manager 122are micro-services, deployed as containers (e.g., “Docker” containers),stateless in nature, may be exposed as a REST (representational statetransfer) application programming interface (API) and are discoverableby other services. Docker is a software framework for building andrunning micro-services using the Linux operating system kernel (withoutderogation of any third-party trademark rights). As an example, whenimplemented as docker containers, docker micro-service code for thecloud storage OS 140, and the cloud manager 122 is packaged as a “Dockerimage file”. A Docker container for the cloud storage OS 140, and thecloud manager 122 is initialized using an associated image file. ADocker container is an active or running instantiation of a Dockerimage. Each Docker container provides isolation and resembles alightweight virtual machine. It is noteworthy that many Dockercontainers can run simultaneously in a same Linux based computingsystem. It is noteworthy that although a single block is shown for thecloud manager 122 and the cloud storage OS 140, multiple instances ofeach micro-service (i.e., the cloud manager 122 and the cloud storage OS140) can be executed at any given time to accommodate multiple usersystems 108.

In one aspect, the cloud manager 122 and the cloud storage OS 140 can bedeployed from an elastic container registry (ECR). As an example, ECR isprovided by AWS (without derogation of any third-party trademark rights)and is a managed container registry that stores, manages, and deployscontainer images. The various aspects described herein are not limitedto the Linux kernel or using the Docker container framework.

An example of the cloud storage OS 140 includes the “CLOUD VOLUMESONTAP” provided by NetApp Inc., the assignee of this application.(without derogation of any trademark rights) The cloud storage OS 140 isa software defined version of a storage operating system 124 executedwithin the cloud layer 136 or accessible to the cloud layer 136 toprovide storage and storage management options that are available viathe storage system 120. The cloud storage OS 140 has access to cloudstorage 128, which may include block-based, persistent storage that islocal to the cloud storage OS 140 and object-based storage that may beremote to the cloud storage OS 140.

In another aspect, in addition to cloud storage OS 140, a cloud-basedstorage service is made available from the cloud layer 136 to presentstorage volumes (shown as cloud volume 142). An example of thecloud-based storage service is the “Cloud Volume Service,” provided byNetApp Inc. (without derogation of any trademark rights). The termvolume or cloud volume (used interchangeably throughout thisspecification) means a logical object, also referred to as a storageobject, configured to store data files (or data containers or dataobjects), scripts, word processing documents, executable programs, andany other type of structured or unstructured data. From the perspectiveof a user system 108, each cloud volume can appear to be a singlestorage drive. However, each cloud volume can represent the storagespace in one storage device, an aggregate of some or all the storagespace in multiple storage devices, a RAID group, or any other suitableset of storage space. The various aspects of the present disclosure mayinclude both the Cloud storage OS 140 and the cloud volume service oreither one of them.

As an example, user systems 108 are computing devices that can accessstorage space at the storage system 120 via the connection system 118 orfrom the cloud layer 136 presented by the cloud provider 104 or anyother entity. The user systems 108 can also access computing resources,as a VM (e.g., compute VM 110) via the cloud layer 136. A user may bethe entire system of a company, a department, a project unit or anyother entity. Each user system is uniquely identified and optionally,may be a part of a logical structure called a storage tenant (notshown). The storage tenant represents a set of users (may also bereferred to as storage consumers) for the cloud provider 104 thatprovides access to cloud-based storage and/or compute resources (e.g.,110) via the cloud layer 136 and/or storage managed by the storagesystem 120.

In one aspect, host systems 102 are configured to also execute aplurality of processor-executable applications 126A-126N (may also bereferred to as “application 126” or “applications 126”), for example, adatabase application, an email server, and others. These applicationsmay be executed in different operating environments, for example, avirtual machine environment, Windows, Solaris, Unix (without derogationof any third-party rights) and others. The applications 126 use storagesystem 120 or cloud storage 128 to store information at storage devices.Although hosts 102 are shown as stand-alone computing devices, they maybe made available from the cloud layer 136 as compute nodes executingapplications 126 within VMs (shown as compute VM 110).

Each host system 102 interfaces with the management module 134 of amanagement system 132 for managing backups, restore, cloning and otheroperations for the storage system 120. The management module 134 is usedfor managing and configuring various elements of system 100. Managementsystem 132 may include one or more computing systems for managing andconfiguring the various elements of system 100. Although the managementsystem 132 with the management module 134 is shown as a stand-alonemodule, it may be implemented with other applications, for example,within a virtual machine environment. Furthermore, the management system132 and the management module 134 may also be referred tointerchangeably throughout this specification.

In one aspect, the storage system 120 provides a set of storage volumesdirectly to host systems 102 via the connection system 118. In anotheraspect, the storage volumes are presented by the cloud storage OS 140,and in that context a storage volume is referred to as a cloud volume(e.g., 142). The storage operating system 124/cloud storage OS 140present or export data stored at storage devices 114/cloud storage 128as a volume (or a logical unit number (LUN) for storage area network(“SAN”) based storage).

The storage operating system 124/cloud storage OS 140 are used to storeand manage information at storage devices 114/cloud storage 128 based ona request generated by application 126, user 108 or any other entity.The request may be based on file-based access protocols, for example,the Common Internet File System (CIFS) protocol or Network File System(NFS) protocol, over the Transmission Control Protocol/Internet Protocol(TCP/IP). Alternatively, the request may use block-based accessprotocols for SAN storage, for example, the Small Computer SystemsInterface (SCSI) protocol encapsulated over TCP (iSCSI) and SCSIencapsulated over Fibre Channel (FC), object-based protocol or any otherprotocol.

In a typical mode of operation, one or more I/O requests are sent overconnection system 118 to the storage system 120 or the cloud storage OS140, based on the request. Storage system 120/cloud storage OS 140receives the I/O requests, issues one or more I/O commands to storagedevices 114/cloud storage 128 to read or write data on behalf of thehost system 102 and issues a response containing the requested data overthe network 118 to the respective host system 102.

Although storage system 120 is shown as a stand-alone system, i.e., anon-cluster-based system, in another aspect, storage system 120 may havea distributed architecture; for example, a cluster-based system that mayinclude a separate network module and storage module, described below indetail. Briefly, the network module is used to communicate with hostsystems 102, while the storage module is used to communicate with thestorage devices 114.

Alternatively, storage system 120 may have an integrated architecture,where the network and data components are included within a singlechassis. The storage system 120 further may be coupled through aswitching fabric to other similar storage systems (not shown) which havetheir own local storage subsystems. In this way, all the storagesubsystems can form a single storage pool, to which any client of any ofthe storage servers has access.

As an example, one or more of the host systems (for example, 102A-102N)or a compute resource (not shown) of the cloud layer 136 may execute aVM environment where a physical resource is time-shared among aplurality of independently operating processor executable VMs (includingcompute VM 110). Each VM may function as a self-contained platform,running its own operating system (OS) and computer executable,application software. The computer executable instructions running in aVM may also be collectively referred to herein as “guest software.” Inaddition, resources available within the VM may also be referred toherein as “guest resources.”

The guest software expects to operate as if it were running on adedicated computer rather than in a VM. That is, the guest softwareexpects to control various events and have access to hardware resourceson a physical computing system (may also be referred to as a hostsystem) which may also be referred to herein as “host hardwareresources”. The host hardware resource may include one or moreprocessors, resources resident on the processors (e.g., controlregisters, caches, and others), memory (instructions residing in memory,e.g., descriptor tables), and other resources (e.g., input/outputdevices, host attached storage, network attached storage or other likestorage) that reside in a physical machine or are coupled to the hostsystem.

Communication between the storage management application 118 and storagesystem 120 may be accomplished using any of the various conventionalcommunication protocols and/or application programming interfaces(APIs), the details of which are not germane to the technique beingintroduced here. This communication can be done through the network 106or it can be done via a direct link (not shown) between the managementsystem 132 and one or more of the storage systems.

Clustered Networked Storage System: The aspects disclosed above havebeen described with respect to a non-cluster-based storage system 120that may have a traditional monolithic architecture where a storageserver has access to a dedicated storage subsystem. However, theadaptive aspects can be implemented in a cluster-based system that has adistributed architecture and where Vservers (222A-222N) can be migratedfrom one cluster to another. The cluster-based system is described belowin detail.

FIG. 2 depicts an illustrative aspect of a storage environment 200including a plurality of client systems 204.1-204.2 (similar to clients108.1-109.N and host 102), a clustered storage system 202 and at leastone network 206 communicably connecting the client systems 204.1-204.2and the clustered storage system 202. As shown in FIG. 2 , the clusteredstorage system 202 includes a plurality of nodes 208.1-208.3, a clusterswitching fabric 210, and a plurality of mass storage devices212.1-212.3 (similar to 114, FIG. 1 )

Each of the plurality of nodes 208.1-208.3 is configured to include anetwork module, a storage module, and a management module, each of whichcan be implemented as a separate processor executable, or machineimplemented module. Specifically, node 208.1 includes a network module214.1, a storage module 216.1, and a management module 218.1, node 208.2includes a network module 214.2, a storage module 216.2, and amanagement module 218.2, and node 208.3 includes a network module 214.3,a storage module 216.3, and a management module 218.3.

The network modules 214.1-214.3 include functionality that enables therespective nodes 208.1-208.3 to connect to one or more of the clientsystems 204.1-204.2 over the computer network 206, while the storagemodules 216.1-216.3 connect to one or more of the storage devices212.1-212.3 that are part of a storage sub-system, similar to 116.

The management modules 218.1-218.3 provide management functions for theclustered storage system 202. Accordingly, each of the plurality ofserver nodes 208.1-208.3 in the clustered storage server arrangementprovides the functionality of a storage server.

A switched virtualization layer including a plurality of virtualinterfaces (VIFs) 220 is provided below the interface between therespective network modules 214.1-214.3 and the client systems204.1-204.2, allowing storage 212.1-212.3 associated with the nodes208.1-208.3 to be presented to the client systems 204.1-204.2 as asingle shared storage pool. For example, the switched virtualizationlayer may implement a virtual interface architecture. FIG. 2 depictsonly the VIFs 220 at the interfaces to the network modules 214.1, 214.3for clarity of illustration.

The clustered storage system 202 can be organized into any suitablenumber of Vservers 222A-222N, in which each Vserver represents a singlestorage system namespace with separate network access. As mentionedabove, each Vserver has a user domain and a security domain that areseparate from the user and security domains of other virtual storagesystems. Client systems 204 can access storage space via a Vserver fromany node of the clustered system 202.

Each of the nodes 208.1-208.3 may be defined as a computer adapted toprovide application services to one or more of the client systems204.1-204.2. In this context, a Vserver is an instance of an applicationservice provided to a client system. The nodes 208.1-208.3 areinterconnected by the switching fabric 210, which, for example, may beembodied as a Gigabit Ethernet switch or any other switch type.

Although FIG. 2 depicts three network modules 214.1-214.3, the storagemodules 216.1-216.3, and the management modules 218.1-218.3, any othersuitable number of network modules, storage modules, and managementmodules may be provided. There may also be different numbers of networkmodules, storage modules, and/or management modules within the clusteredstorage system 202. For example, in alternative aspect s, the clusteredstorage system 202 may include a plurality of network modules and aplurality of storage modules interconnected in a configuration that doesnot reflect a one-to-one correspondence between the network modules andstorage modules.

The client systems 204.1-204.2 of FIG. 2 may be implemented asgeneral-purpose computers or VMs configured to interact with therespective nodes 208.1-208.3 in accordance with a client/server model ofinformation delivery. In the presently disclosed aspect, the interactionbetween the client systems 204.1-204.2 and the nodes 208.1-208.3 enablethe provision of network data storage services. Specifically, eachclient system 204.1, 204.2 may request the services of one of therespective nodes 208.1, 208.2, 208.3, and that node may return theresults of the services requested by the client system by exchangingpackets over the computer network 206, which may be wire-based, opticalfiber, wireless, or any other suitable combination thereof. The clientsystems 204.1-204.2 may issue packets according to file-based accessprotocols, such as the NFS or CIFS protocol, when accessing informationin the form of files and directories.

In a typical mode of operation, one of the client systems 204.1-204.2transmits an NFS or CIFS request for data to one of the nodes208.1-208.3 within the clustered storage system 202, and the VIF 220associated with the respective node receives the client request. It isnoted that each VIF 220 within the clustered system 202 is a networkendpoint having an associated IP address, and that each VIF can migratefrom network module to network module. The client request typicallyincludes a file handle for a data file stored in a specified volume onat storage 212.1-212.3.

Storage System Node: FIG. 3A is a block diagram of a node 208.1 that isillustratively embodied as a storage system comprising of a plurality ofprocessors 302A and 302B, a memory 304, a network adapter 310, a clusteraccess adapter 312, a storage adapter 316 and local storage 313interconnected by a system bus 308. The local storage 313 comprises oneor more storage devices utilized by the node to locally storeconfiguration information (e.g., in a configuration data structure 314).

Node 208.1 may manage a plurality of storage volumes for a Vserver thatis migrated from one cluster to another. The system and processes formigrating Vservers are described below in more detail.

The cluster access adapter 312 comprises a plurality of ports adapted tocouple node 208.1 to other nodes of cluster 100. In the illustrativeaspect, Ethernet may be used as the clustering protocol and interconnectmedia, although it will be apparent to those skilled in the art thatother types of protocols and interconnects may be utilized within thecluster architecture described herein. In alternate aspects where thenetwork and storage modules are implemented on separate storage systemsor computers, the cluster access adapter 312 is utilized by the networkand storage modules for communicating with other network and storagemodules in the cluster 100.

Each node 208.1 is illustratively embodied as a dual processor storagesystem executing a storage operating system 306 (similar to 124, FIG. 1) that preferably implements a high-level module, such as a file system,to logically organize the information as a hierarchical structure ofnamed directories and files on storage 212.1. However, it will beapparent to those of ordinary skill in the art that the node 208.1 mayalternatively comprise a single or more than two processor systems.Illustratively, one processor 302A executes the functions of the networkmodule 104 on the node, while the other processor 302B executes thefunctions of the storage module 216.

The memory 304 illustratively comprises storage locations that areaddressable by the processors and adapters for storing programmableinstructions and data structures. The processor and adapters may, inturn, comprise processing elements and/or logic circuitry configured toexecute the programmable instructions and manipulate the datastructures. It will be apparent to those skilled in the art that otherprocessing and memory means, including various computer readable media,may be used for storing and executing program instructions pertaining tothe invention described herein.

The storage operating system 306, portions of which is typicallyresident in memory and executed by the processing elements, functionallyorganizes the node 208.1 by, inter alia, invoking storage operations insupport of the storage service implemented by the node.

The network adapter 310 comprises a plurality of ports adapted to couplethe node 208.1 to one or more clients 204.1/204.2 over point-to-pointlinks, wide area networks, virtual private networks implemented over apublic network (Internet) or a shared local area network. The networkadapter 310 thus may comprise the mechanical, electrical and signalingcircuitry needed to connect the node to the network. Illustratively, thecomputer network 206 may be embodied as an Ethernet network or a FibreChannel network. Each client 204.1/204.2 may communicate with the nodeover network 206 by exchanging discrete frames or packets of dataaccording to pre-defined protocols, such as TCP/IP. In one aspect, LIFplacement for a migrated Vserver involves selecting a port of thenetwork adapter 310, as described below in detail.

The storage adapter 316 cooperates with the storage operating system 306executing on the node 208.1 to access information requested by theclients. The information may be stored on any type of attached array ofwritable storage device media such as solid-state drives, optical,magnetic tape, bubble memory, storage class memory, electronicrandom-access memory, micro-electromechanical and any other similarmedia adapted to store information, including data and parityinformation. However, as illustratively described herein, theinformation is preferably stored on storage device 212.1. The storageadapter 316 comprises a plurality of ports having input/output (I/O)interface circuitry that couples to the storage devices over an I/Ointerconnect arrangement, such as a conventional high-performance, FClink topology. It is noteworthy that instead of separate network adapter310 and storage adapter 316, node 208.1 may use a converged adapter thatperforms the functionality of a storage adapter and a network adapter.

Vserver Migration: FIG. 3B shows an example of migrating a sourceVserver 320 from a source cluster 326 to a destination Vserver 324 at adestination cluster 328. Clusters 326 and 328 are similar to cluster 202described above with respect to FIG. 2 having a plurality of nodes 208.The Vserver 320 is presented to clients 204. The clients 204 can readand write data using source storage volumes 330A-330N (may also bereferred to as source volume or source volumes 330) at the sourcecluster 326. The storage volumes may be managed by one or more nodes333A-333N (similar to nodes 208 of FIG. 2 ) of the source cluster 326.

Upon migration, the destination storage volumes 332A-332N (may also bereferred to as destination volume or destination volumes 332) aremanaged by nodes 335A-335N (similar to nodes 208 of FIG. 2 ) of thedestination cluster 328. For efficiently migrating the Vserver 320, thesource volumes 330 are configured as a logical structure, referred to asconsistency group (“CG”) 331 that is uniquely identified. The CG 331 isused to implement group control for migrating the source volumes ofVserver 320 to the destination cluster 328, as described below indetail.

To migrate Vserver 320 during a migration operation, first thedestination Vserver 324 is created at the destination cluster 328 duringa setup phase. The destination volumes 332 are then created at thedestination cluster 328 to store information associated with sourcevolumes 330 at the source cluster 326. Details regarding the variousmigrate operation phases are provided below in detail.

Architecture 334: FIG. 3C shows a block diagram of an architecture 334for executing the various phases of a migrate operation to migrate thesource Vserver 320 from the source cluster 326 to the destinationcluster 324, according to one aspect of the present disclosure. As anexample, architecture 334 includes a management plane 336, a controlplane 338 and a data plane 340, according to one aspect of the presentdisclosure. The management plane 338 may be implemented by a managementmodule 218 (FIG. 2 ) and includes a migrate Orchestrator 342 (alsoreferred to as Orchestator 342) that executes or interfaces with aplurality of threads/modules, e.g., a pre-check module 343A, a set-upmodule 343B and management logic 343C that are described below. Themigrate Orchestrator 342 also interfaces with a configurationreplication service (CRS) 344 that replicates configuration informationof the source Vserver 320 to the destination cluster 328, also describedbelow in detail. The configuration information pf the source Vserver 320includes a source Vserver name, identifier, universal identifier(“UUID), nodes that are associated with the Vserver, client systems thatcan access the Vserver with associated permissions, the volumeidentifiers identifying volumes 330 or any other information. Theconfiguration information also includes information regarding thevolumes, e.g., volume identifiers, volume size, volume attributes e.g.,if the volumes have a space guarantee, if the volume is thinprovisioned, any quality of service associated with the volumes, accesscontrol information indicating the permissions associated with eachvolume 330 or any other information. The management plane 336 alsoincludes a group management module 347 that manages migration ofinformation for source volumes 330 as a CG (e.g., 331, FIG. 3B), alsodescribed below in detail.

The control plane 338 executes a group control module 346 that includesor interfaces with state control logic 345A, cut-over logic 345B. Thestate control logic 345A maintains the state of the migrate operation,as described below, while the cut-over logic 345B controls a cut-overphase of the migrate operation, also described below in detail.

The data plane 340 includes an asynchronous engine 348 that enablesasynchronous transfer of data of the plurality of source volumes 330 inthe CG 331. The data plane 340 also includes a synchronous engine 349that is used to transfer information to the destination cluster 328during a cutover phase. In one aspect, the data plane is implemented atthe storage modules 216 that is closer to the storage devices 212. Thisimproves the overall efficiency for migrating the Vserver 320, asdescribed below in detail.

System 400: FIGS. 4A-4B shows examples of an innovative architecture 400to enable migration between the source cluster 326 and the destinationcluster 328, according to one aspect of the present disclosure. Thefollowing provides a brief description of the various components ofFIGS. 4A/4B, and a brief introduction of certain terms used in thisdisclosure, according to one aspect of the present disclosure.

Cluster Communication 402: The source cluster 326 and the destinationcluster 328 communicate using connection 402. The connection 402 uses anetwork connection for transferring information between the clusternodes.

Monarch Node: A node (e.g., 335A, FIG. 3C) on the destination cluster328 that hosts a primary group control module 346B in a storage module(e.g., storage module 216, FIG. 2 ).

Owning Node: A node (e.g., 335A) on the destination cluster 328 thathosts the migrate Orchestrator 342.

Cluster Persistent Storage (CPS): A processor executable service thatoffers metadata volume (MDV) storage (e.g., 430A/430B) for use bycluster applications such as CRS 344.

Vserver Director Module (VDM): A component in the master CRS processthat manages creation and flow of a Vserver stream. The Vserver DM(e.g., 408A/408B) is used to create and update source Vserverconfiguration (e.g., 426A) objects on the destination cluster 328.

Vserver Stream: A CRS construct that connects the source Vserver 320 ofthe source cluster 326 to the destination Vserver 324 at the destinationcluster 328. Configuration baselines and updates made to the sourceVserver 320 flow over the Vserver stream to the destination cluster 328.

Source Cutover Timer 412A: A timer at the source cluster 326 used by thecutover logic 345B to track the progress of a cutover workflow, asdescribed below. If the timer 412A expires before the cutover workflowreaches a point of no return (PONR), the migrate operation is aborted onthe source cluster 326. A similar destination cutover timer 412B is usedin the destination cluster 328.

PONR: PONR is a stage within the cutover workflow, which is reachedafter all destination volumes 428B/430B (similar to 332A-332N of FIG.3C) have been converted to read/write volumes and before starting thedestination Vserver 324 LIF. PONR means that the source cluster 326cannot start the source Vserver 320 from a source cluster node.

Migrate Orchestrator 342: This is a processor executable thread within amanagement module space (e.g., the management plane 336, FIG. 3C) toperform and manage various migrate operation related tasks in thebackground once a UI (User Interface)/REST endpoint 444 returns aconfirmation to a client system to begin the migrate operation. Thisthread runs at a node (e.g., 335, FIG. 3C) of the destination cluster328. This thread creates the destination Vserver 324 on the destinationcluster 328 with the same Vserver name and Vserver identifier of thesource Vserver 320.

CRS (344A/344B): The Vserver migrate operation uses the CRS 344A/344Bfor configuration information replication. CRS 344A/344B provides aframework to replicate configuration data 426A from the source cluster326 to the destination cluster 328 (shown as 426B at the destinationcluster 328). The Vserver migrate operation uses this module/service toreplicate objects in a Vserver domain. When the destination cluster 328receives the configuration information, each object/module can controlhow this object is created/modified on the destination cluster 328. Forvolume objects received by the destination cluster 328, the systemauto-picks where the volume is created based on aggregate capability,headroom, and space availability on destination aggregates, as describedbelow in detail. In the alternative, a client system has an option toprovide a list of aggregates where the destination volumes should becreated.

Config Agent 414A/414B: This module operates between the CRS 344A/344Band Vserver DM 408A/408B, respectively. A CRS stream is created by theOrchestator 342 between the source cluster 326 and destination cluster328 to replicate Vserver scoped objects and operations; and this moduleinteracts with CRS 344 to setup metadata volumes 430B required forconfiguration replication, interacts with the source cluster 326 anddestination cluster 328 to handle CRS configuration baselinereplication, and also handles failures in configuration replication byretrying operations when necessary.

Polling Agent 410A/410B: This module provides a framework to createpolling tasks to poll for an event or completion of a task for themigrate operation. Different components use this module to poll. e.g.,every T second. This module iterates through a list of pending pollingobjects and polls for events/asynchronous tasks. The polling object isdeleted when a corresponding event occurs, or a task is completed. Forexample, this module is used by the Config Agent 414A/414B to poll forcompletion of a baseline configuration information transfer and start anext step in the migrate operation.

Migrate RDB (Replicated Database) table (s) 432A/432B: The RDB tables432A/432B maintain a list of Vserver migrate operations on both thesource cluster 326 and destination cluster 328 and a state of themigrate operation, at any given time. The Vserver migrate operation uses“Group Synchronous” mirroring relationships and maintains RDB entries onboth the source cluster 326 and destination cluster 328 to track themirroring relationships. In one aspect, as a non-limiting example,SnapMirror (without derogation to any trademark rights) technology,provided by NetApp Inc, the assignee of this application is used tomirror information between source and destination cluster nodes. Theadaptive aspects of the present disclosure are not limited to anyspecific mirroring technology.

Failure Module 406A/406B: This module rehosts migrate operation threadsif a failure is detected during the migrate operation. This module isregistered on both the source and destination clusters 326 and 328,respectively, for a callback when the cluster nodes go online oroffline. If the owning node becomes unresponsive or the managementmodule becomes unhealthy, the Orchestrator 342 is rehosted on anothernode. The owning node information, and a state of the migrate operationare tracked persistently in the migration RDB tables 432A/432B. Thispersistent information provides information to the failure module406A/406B as to the recovery steps. For example, if the migrateoperation is in the cutover phase, and if the node on which the cutovertimer thread runs dies, the cutover timer thread is restarted on anothernode. When the failure module receives notification, it performsrecovery operations based on a current state of the migrate operation,if the owning node becomes unresponsive.

Migrate Source Management Module 404: This module operates in the sourcecluster 326 and may be used for updating RDB table 432A entries, performpre-checks, and execute post migration operations or execute abortoperations.

Group Management Module 347A: This module interfaces/reuses themanagement module to manage group synchronous relationships. Followingare some of the operations performed by this module: creates groupsynchronous relationship initialization when a migrate operation isstarted, creates a CG 331 representing the source Vserver (e.g., 320,FIG. 3C) with the Vserver volumes (e.g. 430A/428A or 330A/330N (FIG. 3C)and starts an appropriate “Group workflow” based on the migrateoperation. Seed items and item-mapping to respective destination clusterstorage module based on the source-to-destination volume mapping isexecuted by the Configuration Agent 414B.

Group Control Module 346A/346B: This module is executed in the storagemodule (i.e., the storage module) to perform group workflow for themigrate operation, as described below in detail.

Create and Auto Initialize Module (or API) 425A/425B: This module/API isused when migration is started or resumed by a client system. Thismodule creates a CG synchronous relationship between the source cluster326 and destination cluster 328 with a single CG 331 containing all thevolumes in the Vserver 320; establishes the group relationships, startsa baseline transfer (auto-initialize), and back-to-back asynchronoustransfer using the asynchronous engines 348A/348B. This module can beused even if the volumes at the destination cluster 328 are alreadypartially initialized due to a prior paused/failed migrate operation.This module reuses already transferred data without requiring are-transfer of all the data in the volume. This module can also be usedif some of the volumes are already initialized, and some were addedafter a pause/failure. This module monitors back-to-back asynchronoustransfers of each volume, and when all the volumes reach the “Ready forCutover” criteria, it declares “Ready for Cutover” status to theOrchestrator.

Cutover Pre-Commit Module 422A/422B: This module may be part of thecut-over module 345A/345B and is used when the Orchestrator 342 hascompleted cutover pre-commit processing. The workflow executed by thismodule converts existing asynchronous mirroring relationships intosynchronous relationships; waits for ongoing back-to-back transfer tocomplete, starts a last asynchronous transfer, and then transitions toan “INSYNC” state. In one aspect, each volume independently reaches the“INSYNC” state without coordinating with other nodes and other volumesin the CG 331. Once all the volumes reach the INSYNC state, the moduledeclares a “Cutover Pre-Commit Complete” status to the Orchestrator 342.The Orchestrator 342 using the polling agent 410B periodically polls forthis status update. The INSYNC state indicates that the volumes at thedestination cluster 328 are synchronized with the volumes at the sourcecluster 326.

Cutover-Commit Module 424A/424B: This module may be part of the cut-overmodule 345A/345B and is used when the Orchestrator 342 has completed thecutover pre-commit phase, the cutover-source commit steps and calls toperform a cutover commit operation. The following steps are performedfor each volume: drain and fence at the source cluster 326, whichquiesces and drains any outstanding I/0 s, transfer any metadata trackedoutside the volume to the destination cluster 326, and convert allvolumes to read/write volumes to make the volumes read-writable at thedestination cluster 328. When the cutover-commit is completed, theOrchestrator 342 is notified. The progress of the commit phase can alsobe monitored through polling by the polling agent 410B.

Delete. Module 420A/420B: This module is used to delete the mirroringrelationships created on the destination cluster 328. This module isused when the migrate operation completes/pauses/fails or after a userinitiated “abort” operation is complete. At the destination cluster 328,there are three options that can be used to control what snapshots aredeleted, namely: RETAIN_ALL_SNAPSHOTS: If the pause or migrate operationfailed, then this input is used to retain the snapshots created for themigrate operation. Retaining the snapshots enables resume of data copyfrom where the operation was stopped, when the migrate operation isresumed/restarted later.

RETAIN_NO_SNAPSHOTS: This option is used to delete destination cluster328 snapshots when the migrate operation completes. This deletes allsnapshots created during the various phases of the migrate operation;and RETAIN_ONLY_FINAL_SNAPSHOT: This option is used to delete all thesnapshots except a final snapshot. The final snapshot is retained toperform a data integrity check between the source cluster 326 and thedestination cluster 328, as described below in detail.

Cleanup Module 418A/418B: This module is used to delete mirroringrelationships created at the source cluster 326 from the destinationcluster 328. This module is used when the migrate operationcompletes/pauses/fails, or a client-initiated abort operation iscompleted. At the source cluster 326, a “relationship-info-only”parameter controls whether snapshots created for the migrate operationare deleted during release. The relationship-info-only=true setting isused when the pause or migrate operation failed. Retaining the snapshotsallows mirroring to resume data copy from where it was stopped when themigrate operation is resumed/restarted later and allows a previouslyINSYNC relationship to revert to INSYNC without re-initializing thedestination volumes 428B/430B. When the relationship-info-only parameteris set to “false” then this option is used to delete snapshots at thesource cluster 326 when the migrate operation is completed.

Abort Module 416A/416B: This module is used to pause an existing migrateoperation. This aborts the ongoing transfer of the entire CG 331. Themirroring relationship could be initializing or performing back-to-backtransfer. This module is not used if the migrate operation is already inthe cutover phase (i.e., the pre-commit/commit/post-commit phase). Oncethe CG mirroring is aborted, this module provides a notification to theOrchestrator 342 for abort completion. Once the abort command completessuccessfully, the system can assume that transfers which were runningwill stop.

Synchronous Engine 349A-349B: This module is used to transferinformation for CG 331 between the source cluster 326 and thedestination cluster synchronously, after a baseline transfer has beencompleted, as disclosed below.

Asynchronous Engine 348A-348B: This module is used to transferinformation for CG 331 between the source cluster 326 and thedestination cluster asynchronously, during the baseline transfer hasbeen completed, as disclosed below.

It is noteworthy that although the various modules of FIG. 4A are shownin separate blocks, these modules may be combined in any order and maybe located or interface with each other in any order.

FIG. 4B shows another example of the architecture 400 and its variousmodules described above with respect to FIG. 4A. In FIG. 4B, the variousmodules at the source cluster 326 and the destination cluster 328 aresplit in the user space 440A/440B and kernel space 442A/442B,respectively. The source nodes 333A/33B interface with the destinationnodes 335A/335B using connections 402A-402D. The various modules of FIG.4B have been described above with respect to FIG. 4A and for brevitysake, are not described again.

FIGS. 5A-5F show process flow diagrams for the various phases/stages ofa migrate operation to migrate the source Vserver 320, according to oneaspect of the present disclosure. The following describes the variousphases/stages of the migrate operation with respect to FIGS. 5A-5F usingthe various components described above with respect to FIGS. 4A-4B.

Setup Phase 500: FIG. 5A shows the setup phase process 500 of a migrateoperation. In one aspect, the setup phase of the disclosed technologyhas various innovative features, including creating the CG 331 (FIG. 3C)with the source storage volumes (e.g., 330A-330N, FIG. 3C or 428A and430A of FIGS. 4A/4B) belonging to the source Vserver 320 in the storagemodule. This enables group control close to a data transfer engine inthe storage module, which allows for efficient interaction between thecontrol plane 338 and a data transfer engine (e.g., 348 and/or 349, FIG.3C) of the data plane 340. Separate master processes (e.g., 404 and 342,FIG. 4A) are executed in the source cluster 326 and destination cluster328, respectively, to handle failure scenarios in the setup phase.Recovery is based on idempotent principle (implemented by allcomponents). As described below in detail, different types of failurecan be handled, including cluster failure, node failure, processfailure, network port failure, and network partitioning. Volume andaggregate granular placement support volume to aggregate maps on thedestination cluster 328. Volume placement during the setup phase isbased on properties such as capacity, storage tiers (i.e. performanceand/or capacity tiers) and others. LIF placement is executed to ensureaffinity to volumes to avoid cross-node traffic after migrating. Sourcevolume configuration is preserved at the destination cluster 326. Inanother aspect, a client system can specify the aggregates where thedestination volumes can be placed. The client system can also specifywhich node or port LIFs at the destination cluster 326 are to be usedfor the destination Vserver 324, after migration.

In one aspect, the setup phase of the migrate operation involvesupdating RDB tables 432A/432B at both the source cluster 326 anddestination cluster 328 nodes to track migrate operation processing. TheOrchestrator 342 thread is created on a node in the destination cluster328 to perform various operations, described below. The node on whichthe Orchestrator 342 is running is tracked persistently by the RDB table432B using “owning node” information. This enables restarting theOrchestrator 342 on other nodes, if the owning node fails.

The setup phase further includes creating the destination Vserver 324 ata node of the destination cluster 328. The destination Vserver 324 andthe source Vserver 320 that is being migrated have the same name andUUID (universal identifier). The destination Vserver ID is differentthan the source Vserver ID. The destination Vserver uses the same MSID(master set identifier) as used by the source Vserver 320 for the volumethat will be created later by the Configuration Agent 414B. The MSID isa volume identifier that does not change. The destination Vserver 324created at this stage is placed in a “stopped” state and is enabledafter the migration operation, as described below in detail.

The setup phase further includes setting up CRS transfer streams toreplicate configuration information from the source cluster to thedestination cluster. This replicates the objects within aVserver-domain. As part of the CRS replication, definition of differentobjects is called to create objects on the destination cluster 328. Thiscreates volumes and LIFs on the destination cluster 328. Certain objectsmay need special handling when they are created on the destinationcluster 328. For example, the source Vserver 320 may contain a volumethat is a destination of a mirroring relationship (i.e., the sourceVserver 320 receives information mirrored from another Vserver or anyother entity). Because the migrate operation also uses mirroringtechnology, after CRS replication with no special handling, the volumewill result in two mirroring sources (one Vserver migrate source, andanother mirror source of the volume). To avoid such problems, CRS skipsapplying this configuration information until the migration operation isin the post-cutover phase. Once the volumes are created on thedestination cluster, a group synchronous mirroring relationship iscreated with the CG 331 containing the source volumes 430A/428A in thesource Vserver 320. This uses module 425B, described above. Themirroring relationship uses a new “Migrate” policy. This policy and itsinformation are stored persistently and is available in the managementmodule and the storage module of the source cluster 326. The “Migrate”policy may not include an “auto-cutover” bit because the auto-cutoverfunctionality is managed by the Orchestrator 342.

Referring to FIG. 5A, process 500 begins in block B502, when a pluralityof pre-check operations is executed at both the source cluster 326 andthe destination cluster 328. The RDB tables 432A/432B are created. Theorchestrator 342 is initialized on an owning node of the destinationcluster 328. An entry is created in the RDB tables 432A/432B identifyingthe owning node, the orchestrator 342, the migrate operation (e.g., ajob identifier) and a state value indicating the setup phase of migrateoperation. The state control module 345A updates the initial state ofthe migrate operation.

In block B504, the orchestrator 342 creates the destination Vserver 324with a same Vserver name and UUID as the source Vserver 320. Thedestination Vserver 324 identifier may be different from the Vserver 320identifier. The orchestrator thread 342 configures the state ofdestination Vserver 324 as “stopped”, which indicates that thedestination Vserver 324 is not ready for use yet.

In block B506, a CRS stream is created by the configuration agent 414Ato replicate the Vserver configuration data 426A at the destinationcluster 328. Thereafter, in block B508, the destination storage volumes428B/430B are selected and configured. In one aspect, storage volumesare selected based on properties such as capacity, fabric-pool, andothers. In one aspect, the destination storage volumes 428B/430B areselected from a list of qualified aggregates. The list may be providedby a client system. In another aspect, the volumes are selected based onencryption requirements. In yet another aspect, destination volumes areselected based on available storage capacity, especially if the sourcevolumes have a space guarantee. In another aspect, the destinationvolumes are selected based on performance criteria, e.g., latency,number of IOPS, available performance capacity or any other parameters,as described below in detail. The management system 132 (FIG. 1 )collects storage volume performance data on a regular basis and thisinformation is then used to select the destination volumes for themigrate operation.

Furthermore, in block B508, LIFs are created for the destination Vserver324. LIF selection or placement is executed to ensure affinity tovolumes to avoid cross-node traffic after migrating. In one aspect, adestination port from a given IP address space in the destinationcluster 328 is selected based on level 2 (L2) connectivity with a sourcecluster port. The ports may be located at network adapters used in thesource and destination clusters to communicate with each other. Both thedestination and source ports are within a same subnet. This prevents anydata outage, after the migrate operation is complete.

In block B510, after the destination volumes are created at thedestination cluster 328, a group mirroring relationship is created bygenerating the CG 331 with all the source volumes that will be mirroredto the destination volumes as a group. This relationship is generated bymodule 425B. This information is stored as part of the “migrate” policyand stored in both the storage module and the management module. An“auto-cutover” bit is also established by the orchestrator 342, ifdesired by the user. The auto-cutover bit may be stored in a job objectthat is created by the orchestrator 342 to track the migrate operationor at any other location. Thereafter, in block B512, the process movesto the transfer phase 501, described below with respect to FIG. 5B.

Transfer Phase 501: The transfer phase 501 of the migrate operation, asshown in FIG. 5B is executed using the asynchronous engine 348A andsynchronous engine 349A (FIG. 4A) using the group level mirroringrelationships for the CG 331 created during the setup phase. Thetransfer phase 501 can be paused and then resumed, as described below indetail. The transfer phase 501 includes reusing the RDB tables 432A/432Bto track group level mirroring relationships created with a migratepolicy. The transfer phase performs an initial baseline transfer of thesource volumes 428A/430A to the destination volumes 428B/430B managed bydifferent nodes (e.g., 333A/333B, FIG. 4B) of the source cluster 326.Each node's group control module 346A coordinates completion of thebaseline transfer of the volumes hosted on that node. A master groupcontrol module at a master node coordinates cross-node baseline transfercompletions. When the baseline transfers are in-progress, the sourcevolumes 428A/430A continue to accept incoming I/Os that result in newchanges. To keep the destination volumes 428B/430B closely synchronizedwith the source volumes, a new incremental snapshot of source volumes istaken to replicate any incremental changes, as described below. This isexecuted continuously so that data on the destination volumes 428B/430Bis close to the data on the source volumes 428A/430A. This is referredto as “back-to-back transfers”. In addition to data replication, thesnapshots of the volumes are also replicated to the destination cluster.This includes user-created, system-created scheduled snapshots,snapshots created for other use cases such asAsynchronous/Synchronous/cloud backup mirroring relationships, describedbelow in detail.

Also, during the transfer phase, new snapshot creations are allowed, andany newly created snapshots are also replicated to the destinationcluster 328. This is controlled by having a “mirror all snapshots” rulewithin the migrate policy. If the mirroring relationship is using themigrate policy, each volume reaches “Ready for Cutover” criteria whenthe last few (e.g., 3) back-to-back transfers are complete within acertain duration, e.g., 5 minutes. When all the nodes/volumes reach“Ready for Cutover” criteria, the master group control module declaresthe CG 331 as “Ready for Cutover”. The Orchestrator 342 uses the pollingagent 410B to check on the progress and status of this phase.

Even after declaring “Ready for Cutover”, the group control module 346Acontinues to perform back-to-back transfers including transferring usercreated or scheduled snapshots so that the destination volumes keep upwith the changes at the source volumes. It is possible that afterdeclaring “Ready for Cutover”, additional snapshots created on thesource cluster before the Orchestrator 342 disables source snapshotcreation may need to be transferred. These snapshots are transferredeither as the back-to-back phase continues waiting for the cutover inputfrom the Orchestrator 342 or are transferred during the“cutover-pre-commit” phase that is described below.

Furthermore, during the transfer phase, CRS replication from the sourcecluster 326 to the destination cluster 328 continues, and theOrchestrator 342 continues to poll the status of the “Create+AutoInitialize” operation. If this operation fails, for a re-try able error,the Orchestrator 342 retries the “Create+Auto Initialize” operation. The“Create+Auto Initialize” is an idempotent operation and the Orchestrator342 can continue calling this API (425B, FIG. 4A) without performing anycleanup or undo steps.

At this stage of the transfer process, the migrate operation is readyfor a cutover phase. If an auto-cutover option is off, then the processwaits for a user input to invoke the cutover phase. If the auto-cutoveris on, the migrate operation state goes from “Transfer” to “Cutoverphase.” As an example, the auto-cutover may be enabled as a defaultsetting. It is noteworthy that while waiting to start the cutover phase,the back-to-back transfer workflow continues. The Orchestrator 342continues to poll on the operation UUID created for the “Create+AutoInitialize” operation to monitor the progress or status of thebackground transfers. If the operation fails and it is not fatal, theOrchestrator 342 retries by retrying the “Create+Auto initializeoperation.

Referring now to FIG. 5B, the transfer phase entry begins in block B514.In block B516, the source volumes 428A/430A are initialized based on themigrate policy that was created by module 425B for the migrate operationduring the setup phase.

In block B518, a baseline transfer of the source volumes 428A/430B foreach node (e.g., 333A and 333B, FIG. 4B) is executed. In one aspect, toexecute the baseline transfer, a snapshot (i.e., a point in time copy)of the source volumes 428A/430A is taken and transferred to thedestination nodes (e.g., 335A/335B, FIG. 4B). The baseline transfer maybe executed using the asynchronous engine 348A. Once the baselinetransfer is completed for all source nodes, the process executesincremental transfer at the source volumes, since baseline transfer.This may be executed by taking an incremental snapshot of the sourcevolumes. Thereafter, in block B522, the process determines if all thevolumes are “ready for cut-over”. In one aspect, this is based oncompleting the baseline and incremental transfer. Once the “ready forcut-over” stage is reached, the migrate operation moves to a pre-commitphase of the cut-over phase. If the auto-cutover option is enabled, thenthe migrate operation automatically moves to the pre-commit phase inblock B524, otherwise, a user input is used to move the pre-commitphase. It is noteworthy that the system continues to allow takingsnapshots of the source volumes during the transfer phase, even afterthe baseline transfer and incremental transfer. These snapshots aretransferred during the pre-commit phase, when the source Vserver accessis disabled, as described below.

Cutover phase: In one aspect, the cutover phase may be user initiated orinitiated automatically when the auto-cutover option is enabled. Theauto-cutover can be enabled or disabled by setting a bit valueassociated with the source Vserver 320, a user or any other system. Theauto-cutover setting is available to the Orchestrator thread 342 toinitiate the cut-over phase. In one aspect, the Orchestrator 342 startsthe cutover phase, which has multiple stages, e.g., a pre-commit stageto prepare the source cluster 326 and the destination cluster 328 toenter an “outage window”; a commit stage when the outage window occurswith no user system data access; a source commit stage that preventsaccess to data from the source cluster 326 in preparation to transfercontrol over to a destination node in the destination cluster 328; adestination commit stage to restore access to the migrated Vserver 324from the destination cluster 328; and a post-commit stage when access isrestored via the destination cluster 328. The following provides adescription of the various stages of the cutover phase:

Pre-commit stage 503: FIG. 5C shows a process flow 503 for thepre-commit stage/phase of the cut-over phase for migrating the sourceVserver 320 to the destination cluster 328 as Vserver 324. Thepre-commit stage transfers the mirroring relationships to thesynchronous engine 349B to ensure a short cutover window. The termcutover window means a duration during which the cut-over phase needs tobe completed for the migrate operation to succeed. The pre-commit stagebegins with starting a “Pre-Commit Timer,” shown as the source timer412A (FIG. 4A) on the source cluster 326. The source timer 412A is setto X minutes, e.g., 120 minutes, within which this stage has to becompleted. The source timer 412A detects cases where the pre-commitstage fails or is likely to fail and hence can't progress to the next,commit stage.

The source timer 412A is disabled when the process moves to the commitstage, back to the transfer phase due to errors or if the migrateoperation fails. When the source timer 412A expires, the migrateoperation is failed, and the pre-commit steps are undone. The migrateoperation state is updated to indicate a “Migrate failed” state. Duringpre-commit, the source Vserver 320 configuration is locked i.e., nochanges can be made after the lock is in place. The process waits forpending configuration replication to complete and for the configurationchanges to apply on the destination cluster 328. If the source Vserver320 contains volumes that are mirroring destinations, then theconfiguration update for those volumes is postponed till the postcutover stage, described below.

The Orchestrator 342 co-ordinates the various calls for the pre-commitstage. These calls can be used to perform various pre-commit stagetasks, e.g., if a mirroring subsystem chooses not to replicate snapshotswhen it transitions to the synchronous engine 349A, it can perform stepsto disallow snapshot creation at this stage; for mirroring to the cloudlayer 136, it can choose to quiesce and abort transfers to the cloudlayer 136; and if the source Vserver 320 is the destination of anothermirroring relationship, quiesce and abort the relationship from itssource Vserver.

Once the various subsystems have completed pre-commit tasks, the groupcontrol module 346B is called to transfer from the asynchronous engine348A to the synchronous engine 349A. This group control workflowincludes stopping and waiting for completion of a previous mirroringworkflow which was performing back-to-back transfers; starting a newgroup workflow to perform the following operations independently acrossvolumes and nodes: wait for ongoing snapshot transfers to complete;perform any additional back-to-back transfers if the destination volumehasn't converged to the source volume; perform a last asynchronoustransfer; and transition from an asynchronous to a synchronous state andwait for the volume to reach an “INSYNC” state. Once all the destinationcluster nodes (e.g., 335A/335B, FIG. 4B) reach the “INSYNC” state,declare to the Orchestrator 342 that the pre-commit stage is complete.All NFS delegations are revoked to prepare for the commit stage of thecut-over phase, as described below in detail.

Referring now to FIG. 5C, the pre-commit stage is entered in block B530,after the transfer phase of the migrate operation is successfullycompleted. In block B532, the orchestrator thread 342 locks the sourceVserver 320 configuration to prevent any changes. In block B534, anyconfiguration updates that are pending at the source cluster 326 areapplied to the destination Vserver 324. In block B536, non-migrateoperation related snapshot creation is disabled at the source Vserver320, any mirroring relationships that mirror source volumes to the cloudlayer 136 are paused and any other mirroring relationships where thesource Server 320 volumes are the destination or source for a mirroringoperations are paused. This ensures that the configuration and data isnot likely to change at the source cluster 326. Thereafter, in blockB538, all asynchronous transfers of source Vserver 320 snapshots to thedestination cluster 328 are completed. The transfer process is thenmoved to the synchronous engine 349A that synchronously transfersinformation for the plurality of nodes 333A/33B at the source cluster326. The process determines if all the source volumes are ready forcut-over within a cut-over duration. If yes, then the status of all thevolumes is updated to “INSYNC” in block B540. This information is storedat RDB tables 432A/432B. If the volumes are not ready for the commitphase, the pre-commit stage is failed. If successful, the migrateoperation moves the commit state that is described below in detail withrespect to FIG. 5D.

Cutover commit stage 505: FIG. 5D shows a process 503 for the cutovercommit stage that is intended to complete this stage within a “totaloutage window” i.e., a duration when client system I/0 s are delayed forprocessing. A persistent state for the commit stage is maintained atboth the clusters 326 and 328, e.g., at the RDB tables 432A/432B. Thecommit stage begins in block B544, after a successful pre-commit stage,described above with respect to FIG. 5C.

In block B546, an auto-resync feature is disabled. This stops executionof any mirroring relationships associated with the source volumes. Thesource timer 412A is started in the source cluster 320 owning node andthen in block B548, access to the source Vserver 320 is stopped. Themigrate operation is failed if the source timer 412A expires and thesource Vserver 320 is restarted to process I/O requests.

In block B550, the destination cut-over timer 412B is started. In blockB552, the group control module 346B is started to control the workflowof a commit stage idempotent operation. The workflow includes thefollowing: drain and fence any I/O on the source cluster 326; replicateany content stored outside the source volumes to the destinationcluster; take a final snapshot of the source volumes, prior to allowingnew I/Os to be processed from the destination cluster 328, which enablesdata integrity checks between the source cluster 326 and the destinationcluster 328; and convert the destination volumes from a read-onlyconfiguration to read/write volumes to allow reads and writes from thedestination cluster 328. It is noteworthy that the destination Vserver324 is not yet operational, therefore, client generated I/0 s are stillnot processed from the destination cluster 328. At this point, if afailure occurs, the source Vserver 320 can be restarted. If the migrateoperation doesn't transition to a next stage, i.e., the PONR (Point ofNo Return) stage within a certain duration, the destination timer 412Bexpires and the source Vserver 320 is restarted on the source cluster326.

Once the cutover commit stage is completed in block B554, theOrchestrator 342 is notified. To handle any missed notification, thecutover completion status is also polled by the Orchestrator 342. Forany errors that can be retried, the Orchestrator 342 can restart themigration from the beginning of the transfer phase. If the commit stagefails, the source Vserver 320 is restarted and the source cutover timer412A is disabled. The migrate operation can then be restarted from thetransfer phase. The destination volumes 428B are reconverted to DP(i.e., read-only) volumes, if they were converted to read/writeconfiguration during the commit stage.

Post Commit stage 507: FIG. 5E shows the post commit stage 507 thatbegins in block B564, according to one aspect of the present disclosure.During the post commit stage 507, in block B566, the migrate operationstate is updated to the PONR state on the source and destination clusterRDBs 432A/432B, respectively, to prevent the source cluster 326 startingthe source Vserver 320 again. In block B568, the source cutover committimer 412A and the destination timer 412B are cancelled. In block B570,the destination Vserver 324 is started on a destination cluster node(e.g., 335A or 335B, FIG. 4B). Thereafter, in block B572, the migrateoperation moves to a post cut-over phase and then a final clean-up stagethat are both described below in detail with respect to FIG. 5F.

Post Cutover phase and Final Cleanup phase 509: FIG. 5F shows theprocess 509 for the post cut-over phase and the final cleanup phase ofthe migrate operation, according to one aspect of the presentdisclosure. Although both phases are shown within FIG. 5F, the finalcleanup phase occurs after completion of the post cut-over phase thatbegins in block B580. In block B582, all mirroring relationships arefirst deleted on destination cluster 328. Thereafter, in block B584, thesnapshots created for the migrate operation on the destination cluster328 are deleted, except for the final snapshot. In block B586, the finalconfiguration retrieved from the source Vserver 320 are applied to thedestination Vserver 324. If there is an error during post cutover, theOrchestrator 342 retries to fix the error. If the source Vserver 320contained any volumes that were configured as destination volumes for amirroring relationship, then the mirroring objects that were not appliedon the destination cluster 328 in the earlier phases of the migrateoperation are applied. Thereafter, the final cleanup phase is started inblock B588.

In one aspect, the final cleanup is controlled by an“auto-source-cleanup” setting in the migrate policy. If the“auto-source-cleanup” option is not set, the process stays in this phasetill the client system invokes a “source-cleanup” operation. Once theclient system invokes the “source-cleanup” operation or ifauto-source-cleanup option is set, the operation moves to a finalcleanup phase in block B588. Thereafter, in block B590, all mirroringrelationships of the source volumes at the source cluster 320 aredeleted from RDBs 432A/432B. All snapshots taken of the source Vserver320 are deleted.

In block B592, data integrity checks are performed to ensure that thefinal snapshot of the destination Vserver 324 is the same as the sourceServer 320. The enables the source Vserver 320 to be brought back onlineif there is a failure. Thereafter, the source volumes 428A/430A, theLIFs associated with the source Vserver 320, any other objects createdfor or by the source Vserver 320 and the source Vserver 320 are deleted.In block B594, the final snapshot of the destination volumes 428B/430Bis also deleted. The status of the migrate operation is then updated inblock B596.

State Diagram 600: FIG. 6 shows a state diagram 600 for tracking thevarious phases of the migrate operation described above in detail. Themigration operation states are tracked by a state control logic 345A(FIG. 4A) or by any other module. As mentioned above, the migrationoperation states are persistently stored at both the source cluster 326and the destination cluster 328, so that if the migrate operation ispaused, failed or aborted, appropriate action can be taken.

The migration operation begins with a pre-check state 602, and after thepre-check, the setup phase state 604 is reached, described above withrespect to FIG. 5A. Once setup phase is complete, the transfer state 606is reached, described above with respect to FIG. 5B. After the transferphase is completed, the migrate operation transitions to a “ready forcut-over” state 608, also described above with respect to FIG. 5B. Whenall the source volumes are ready for cut-over, the migrate operationtransitions to the cut-over phase 626. Within this phase there aremultiple stages/states, namely the pre-commit state 628 described abovewith respect to FIG. 5C, the source commit state 630 and the destinationcommit state 632, described above with respect to FIG. 5D. After thedestination commit state, the migrate operation transitions to the postcommit state 634. State 636 indicates the completion of the post commitstate described above with respect to FIG. 5E. The migrate operationthen moves to the post-cutover state 638 and the source (or finalcleanup state) 640, both described above with respect to FIG. 5F. State642 indicates successful completion of the migrate operation, whilestate 644 indicates a failure. The migrate operation failure isdescribed below with respect to FIGS. 7E and 7F.

The state diagram 600 also shows the pause state 610 that indicates themigrate operation has been paused. The start of the pause stage isindicated by state 612, while a successful pause operation is shown bystate 616. If the pause attempt fails, then it is shown by state 614.Details of the pause process are provided below with respect to FIG. 7A.

In one aspect, the migrate operation can be aborted, as shown by state618. The abort state can be reached from the pause states 616, pausefailed state 614 or other failed states. It is noteworthy that the abortstate can be reached from other states as well, e.g., the abort statemay be reached before reaching the cut-over stage. State 620 indicatesthat the abort process has started, while state 622 indicates asuccessful abort operation. If an abort attempt fails, it is indicatedby state 624.

Migrate Pause Operation 700: FIG. 7A shows the migration pause operation700, according to one aspect of the present disclosure. Depending uponthe size of the source Vserver 320 that is migrated, the migrateoperation may be a long operation. The technology disclosed hereinallows a client system to pause the migrate operation for one or morereasons, e.g., to perform operations that were not allowed while themigration is in progress and reduce network usage or any other reason.The migrate pause option is before the cutover commit stage describedabove. When the migration operation enters a “pausing state,” (612, FIG.6 ) data replication and configuration information replication betweenthe source cluster 326 and the destination cluster 328 is paused.However, objects created on the destination cluster 328 such as volumes,snapshots, LIFs and others are left intact. The destination Vserver 324on the destination owning node (e.g., 335A) remains locked, and nomodification to the destination Vserver 324 is permitted. The sourceVserver 320 is unlocked for example, to enable volumedeletion/addition/move, LIF changes and other operations. The mirroringrelationships between the source cluster 326 and destination cluster 328are deleted. It is noteworthy that the Orchestrator 342 and othermigrate operation threads check for a pending pause request, prior tostarting any extensive operation or when the Orchestrator 342 isrestarted.

In one aspect, to pause the migration operation, a command is receivedin block B702. The RDB tables 432A/432B are updated to indicate apending migrate pause status. During this state, no other migrateoperation is allowed on the source Vserver 320. In block B704, duringthis state, any data replication between the source cluster 326 anddestination cluster 328 is aborted. If the CG 331 is still performinginitialization (i.e., a baseline transfer, as described above), itterminates the ongoing initialized workflow. To stop configurationreplication, a state of the Vserver CRS stream is set to “down” on boththe source cluster 326 and the destination cluster 328.

If the migrate state is in the cutover pre-commit stage, then in blockB706, the steps already performed during the pre-commit stage areundone. The progress of undoing the steps from this stage are trackedpersistently so that if the Orchestrator 342 or any other thread becomeunresponsive, the pause operation could still be idempotent. If thesource Vserver 320 contains mirroring destinations, the mirroringrelationship is resumed and the source Vserver 320 is unlocked in blockB708. The CG 331 mirroring relationships are removed, and any snapshotstaken prior to the pausing state are preserved. Thereafter, in blockB710, the migrate operation state is then moved to a “Paused” state(616, FIG. 6 ). During the “Paused” state, only “Resume” or “Abort”operations can be performed.

A previously paused migrate operation is resumed using a “Vservermigrate resume” operation. The resume operation is in effect theidempotent version of the “Vserver migrate start” operation. It performsall the operations performed for the source Vserver 320 to restart themigration. One difference between a new Vserver migrate operationvis-à-vis a Vserver resume operation is that some or all the requiredobjects on the destination cluster 328 may already be present, hence theobjects at the destination cluster are reconciled with the sourcecluster 326 by the CRS 344B. For the resume operation, the migrateoperation will restart from the setup phase, but it will not result inrecopying the entire data and configuration information, instead only anincremental copy operation is used that replicates changed information.This saves time and is hence more efficient.

Cloud Backup Process 726: The source Vserver 320 may have one or morevolumes (e.g., 428A/430A) that may have a cloud backup relationship.This means that the snapshot of the volumes are backed up to a datastore in the cloud layer 136.

FIG. 7B shows a process 726 for handling the cloud backup relationshipsduring a migrate operation as described above. In block B728, theprocess first determines that one of source volumes 428A/430B has acloud backup relationship. This information is obtained from volumeconfiguration data that is accessible to the Orchestrator 342. In blockB730, the migrate operation checks if the successful of the migrateoperation will result in a capacity-based license violation and whetherthe destination cluster 328 has network access to the cloud layer 136.This information is stored as cluster configuration data and availableto the Orchestrator 342. In block B730, the migration is failed, if thedestination cluster 328 does not have a license to mirror the volumesmigrated from the source cluster 326 to the cloud layer 136.

If the destination cluster 326 has the appropriate license, then inblock B732, the data transfer to the cloud object storage continuesduring the migration operation till the source Vserver 320 reaches thecutover pre-commit phase. It is noteworthy that cloud storage usesdifferent data format on a cloud object store compared to the storagesystem 120. For example, L0 (level 0) volume blocks that store data arepacked together in a single cloud block. The mapping between a virtualvolume block number (VVBN) to the cloud back number (CBN) are tracked ina metafile “vmap metafile”.

In block B734, transfer to the cloud layer 136 is paused using a“quiesce” operation on the cloud backup relationship. The cloud backupspecific metafiles are rebuilt on the destination cluster 328 and nometafile is replicated to the destination cluster 328. In the postcutover phase, new mapping between VVBN to cloud block number isconstructed.

Volume Placement (736): FIG. 7C shows a process 736 for volumeplacement, according to one aspect of the present disclosure. The volumeplacement occurs during the setup phase of the migrate operation,described above with respect to FIG. 5A. In one aspect, the volumeplacement at the destination cluster 328 is based on a list of qualifiedaggregates. If a source volume (428A/430A) is configured with a spaceguarantee, then only a destination aggregate with enough storage room isused. The destination aggregate is picked from a list of qualifiedaggregate based on: tracking the number of IOPS for the source volumes428A/430A processed by the source Vserver 320 at the source cluster 326(block B746). This information is managed by the management module 134that retrieves IOPS data for each volume from the storage system 120 andif applicable, the cloud layer 136.

The available headroom on the destination aggregates is determined inblock B742. This is based on tracking, by the management module 134, thelatency and a maximum number of IOPS (and/or utilization) processed bythe destination aggregates. In this context, latency means a delay inprocessing an I/O request and may be measured using different metricsfor example, a response time. Headroom in this context means availableperformance capacity of a destination aggregate at any given time.Headroom can be based on a relationship between latency and a maximumnumber of IOPS (or utilization) that can be processed by eachdestination aggregate. At a high level, the available headroom at anygiven time can be defined by the following relationship:

${Headroom} = \frac{{{Optimal}{Point}} - {{Operational}{Point}}}{{Optimal}{Point}}$

A latency v. IOPS curve is generated, where latency is plotted on theY-axis and maximum IOPS (or utilization) is plotted on the X-axis. Anoptimal point, after which latency shows a rapid increase representsmaximum (or optimum) utilization of a resource beyond which an increasein workload is associated with higher throughput gains than latencyincrease. Beyond the optimal point, if the workload increases at thedestination aggregate, the throughput gains or utilization increase issmaller than the increase in latency. An operational point shows acurrent throughput of a destination aggregate.

In block B744, the destination aggregate is selected based on thetracked IOPS, available headroom, size of the source volume and theavailable space on the destination aggregate. If the source volume isthin-provisioned, then the size of the source volume could larger thanthe actual space used by the volume. In that case, the actual space usedis considered for volume placement, instead of the presented volumesize. The volume placement operation will use the logical volume sizeplus extra space required for any space efficiency violation when itlooks for a destination aggregate.

LIF Placement 746: As part of the CRS replication, the data LIFs on thesource Vserver 320 are replicated to the destination cluster 328. FIG.7D shows the process for creating LIFs on the destination cluster 328,according to one aspect of the present disclosure. In block B748, one ofthe ports (e.g., a port at the network adapter 310, FIG. 3A) on thedestination cluster 328 in each IP address space that has L2 (Level orLayer 2) connectivity to a source cluster port in the same subnet as thedestination data LIF port is selected. L2 in this context is a broadcastMedia Access Control (MAC) level network. In block B750, a LIF manager(not shown) performs L2 ping from the destination port to the sourceport. This ensures that the selected destination port is reachable, andthere will be no data outage once the migrate operation is complete. Theexternal clients 108 will also be able to communicate through theselected destination port.

If the destination data port has no L2 connectivity to the source dataport, then in block B752, the LIF manager checks if there is a subnetobject on the destination cluster 328 that maps to the same subnet ofthe source LIF. If such a subnet object exists, then it picks any portfrom a broadcast domain associated with the source subnet to create adestination LIF. Prior to migration any IP address space and/or VLAN arecreated on the destination cluster 328. The number of LIFs created onthe destination Vserver 324 are the same as that on the source Vserver324. Any additional LIFs that need to be created, if the topology of thedestination cluster 328 is different from the source cluster 326, arecreated after the migration is complete. It is noteworthy, that the LIFconnectivity checks described herein are optional and the migrateoperation can be executed without conducting the LIF connectivitychecks. Furthermore, if the source cluster 326 and the destinationcluster 328 are not in the same L2 network, the migrate operation can beexecuted if connectivity is available via a L3 (Level or Layer 3)network that is governed by managing network transmission using IPaddresses. As an example, the BGP (Border Gateway Protocol) and virtualIP (VIP) address can be used for LIF migration. The VIP LIFs, beingvirtual, are not tied to any particular node/port. The prerequisite isthe existence of a BGP LIF on each node in the destination cluster 328.BGP is a standardized exterior gateway protocol designed to exchangerouting and reachability information among autonomous systems on theInternet. BGP is classified as a path-vector routing protocol, and itmakes routing decisions based on paths, configured network policies, orrule-sets.

Migrate Operation Failure Handling 701: FIG. 7E shows an example of aprocess flow 701 for handling different failure conditions that mayoccur during the various phases/stages of the migrate operationdescribed above. In one aspect, an inter-cluster network failure may bedetected in block B703, while the migration operation is in progress.The inter-cluster, network failure may be detected by a network accesslayer (e.g., 806, FIG. 8 ). The inter-cluster network failure may resultin a degraded or loss of network connection between the source cluster326 and the destination cluster 328. The failure may be detected orreported to the failure modules 406A/406B, depending on which cluster ornode detects the network failure. In block B705, the process determinesif the migration is in the cut-over phase. This information is availablefrom the migrate operation state (FIG. 6 ) that is stored at RDBs432A/432B. If yes, then the source Vserver 320 is restarted if the PONRstage has not been reached. If the migrate operation is not in thecut-over phase, then in block B709, a job object is created to monitorthe health of the inter-cluster communication. The migrate operation isrestarted and the process moves to block B729 that is described below indetail.

As another example, a process involved with the migrate operation mayfail in block B711. In block B713, the failure module 406A/406Bdetermines if the failed process is the orchestrator 342. If not, thenthe failed process is restarted at a healthy node in block B715.Thereafter, in block B717, any outstanding requests for the failedprocess are processed and the migrate operation continues. If the failedprocess is the orchestrator 342, then the process moves to block B721,described below.

In block B719, a failure is detected at a destination node (e.g.,335A-335B). The orchestrator 342 is started at a new healthy node inblock B721. The migrate operation then waits for the resources at thenew node to become available in block B723. The process then moves toblock B729, also described below in detail.

In yet another example, the process determines if there is anintermittent failure in block B725. If yes, the process moves to blockB733, described below in detail. If not, then the intermittent failureis reported to a client system in block B727 and the process moves toblock B729, described below.

In another example, a network error may occur within the source cluster326 or the destination cluster 328 in block B731. The network error mayoccur due to software/hardware failure within the affected cluster. Inblock B733, the migrate operation tries a failing idempotent task for acertain number of times (e.g., N times). If successful, the migrateoperation continues, otherwise, the process moves to block B729.

In block B729, a current status of the migrate operation is obtainedfrom the state diagram of FIG. 6 that is updated and stored at RDB432A/432B. If the migrate operation is in the cut-over phase (B737),then the cut-over tasks are undone in block B739 and the process movesto block B743. If the migrate operation is in the cut-over pre-commitstage (B741), then the pre-commit steps are undone in block B743. If themigrate operation is in the transfer phase (B745), then the transferphase tasks are undone and the process moves to block B751. If themigrate operation is in the setup configuration phase (B749), then thesetup tasks are undone on block B751 and the migrate operation isrestarted in block B753.

If the migrate operation is in the post cut-over phase (block B759),then the post cut-over tasks are undone in block B761 and the migrateoperation is restarted from the post-cut-over phase in block B763. Ifthe migrate operation is in the final (or source) cleanup stage (B765),then the cleanup tasks are undone in block B767 and the migrateoperation is restarted from the cleanup stage.

FIG. 7F shows another process flow 714 to handle the various failureconditions that may occur during a migrate operation. The failurehandling is executed by the failure module 406A/406B in conjunction withthe other modules, e.g., the orchestrator 342. The migrate operationenters a failed stage when the migrate operation cannot be auto-healeddue to failures that may require manual intervention. After an error isfixed, a client system (e.g., 108, FIG. 1 ) can resume the migrateoperation or can abort the migrate operation. The migrate failurehandling is similar to the migrate pausing process described above. Inanother aspect, the failure handling state can be combined with thepause handling operations for failures that occurred prior to thecutover phase. In one aspect, failure handling depends on the state ofthe migrate operation when the failure occurred, as described below withrespects to blocks B716, B718, B720, B722 and B724 of FIG. 7F.

Setup phase Failure Handling (B716): If the migrate operation failsduring a pre-check operation; the failure is reported to the user. Ifthe migrate operation fails during an asynchronous pre-check stage, theoperation state at the RDB is updated to the “migrate_failed” state withthe appropriate reason. If the migrate operation fails after thedestination Vserver 324 is created, then the destination Vserver 324 isnot deleted but it stays locked. If the migrate operation failed duringvolume creation at the destination cluster 326, the CRS streams areaborted and the migrate operation state is updated to “migrate_failed”state.

Transfer Phase (B718): If the migrate operation failed during thisphase, then the transfer operation to transfer snapshots of the sourcevolumes 428A/430A is aborted, the mirroring relationships are released,the CRS streams are aborted, the snapshots taken during the transferphase are retained and the migrate operation state is updated tomigrate_failed state.

Cutover Pre-Commit (B720): If the migrate operation failed during thisstage of the migrate operation, then a transfer operation transferringsource volume 428A/430A snapshots is aborted, the mirroringrelationships are released, the CRS streams are aborted, the snapshotstaken before the failure are retained and the migrate operation state isupdated to migrate_failed. If the source Vserver 320 is locked, then itis unlocked.

Cutover Commit (B722): If a failure is triggered on the source cluster326 e.g., the source cutover timer 412A expired, then PONR updates aredisallowed from the destination cluster 328, drain and fence steps areundone on the source cluster 326, if it was already performed and themirroring relationships are removed. The source Vserver 320 is restartedand unlocked. If the destination cluster 328 cannot communicate with thesource cluster 326 to stop the source Vserver, then the source cluster326 performs its recovery. The destination cluster 328 deletes all thesnapshots for the migrate operation, deletes the mirroringrelationships, and the migrate operation state is updated tomigrate_failed state. If commit stage returns an error, then the sourcecluster 326 performs the recovery based on the source cutover timer412A. The snapshots prior to the failure are retained and any cutovercommit steps are undone. If any of the destination volumes 428B/430Bwere configured as read/write volumes, they are rolled back to aread-only state. The final snapshot is also deleted.

Cutover Post Commit (B724): If a PONR state update fails on the sourcecluster 326, then the source cluster 326 performs its error recovery asexplained above. The destination cluster 328 performs the same errorrecovery as described above. If a PONR update request/response timedout, then the destination cluster 328 assumes that the PONR updatedidn't make it to the source cluster 326. This will prevent the sourceVserver 320 to be brought on-line at both the source and the destinationclusters. If PONR update fails at the destination cluster 328, thesource cluster 326 will not start the source Vserver 320.

In one aspect, various methods and systems for migrating a Vserver areprovided. One method includes generating (B502, by the processor, aconsistency group (CG) (e.g., 331, FIG. 3B) having a plurality of sourcestorage volumes (330, FIG. 3B) managed by a source Vserver (320, FIG.3B) of a source cluster (326, FIG. 3B) for a migrate operation tomigrate the plurality of the source storage volumes as a group to aplurality of destination storage volumes (332, FIG. 3B) of a destinationcluster (328, FIG. 3B); establishing (B504, FIG. 5A), by the processor,a mirroring relationship between the source cluster and the destinationcluster for managing asynchronous transfer of the plurality sourcestorage volumes in the CG to the plurality of destination storagevolumes during a transfer phase of the migrate operation; replicating(B518, FIG. 5B), by the processor, a logical interface of the sourcecluster to the destination cluster, the logical interface providing anetwork address to access the source cluster; and automaticallyselecting (FIG. 7C), by the processor, a destination port at thedestination cluster, associated with the replicated logical interface.The method further includes determining, by the processor, aninter-cluster failure (B703, FIG. 7E) between the source cluster and thedestination occurring while the migrate operation is at a point of noreturn (PONR); and restarting (B707, FIG. 7E), by the processor, thesource Vserver at the source cluster and the migrate operation.

The method further includes undoing (B751, FIG. 7E), by the processor,any tasks executed during a setup phase of the migrate operation, inresponse to a failure condition occurring during the setup phase; andrestarting (B753, FIG. 7E), by the processor, the migrate operation. Themethod also includes undoing (B718, FIG. 7F), by the processor, anytasks executed during a transfer phase and a setup phase of the migrateoperation, in response to a failure condition occurring during thetransfer phase; and restarting, by the processor, the migrate operation.

The method also includes undoing (B720, FIG. 7F), by the processor, anytasks executed during a cut-over pre-commit phase, a transfer phase anda setup phase of the migrate operation, in response to a failurecondition occurring during the cut-over pre-commit phase; andrestarting, by the processor, the migrate operation. The method furtherincludes retrying, by the processor, the task associated with themigrate operation, in response to a network error detected at the sourcecluster, the destination cluster or both the source and the destinationcluster.

In yet another aspect, methods and systems for Vserver migration areprovided. One method includes executing (B518, FIG. 5B), by theprocessor, a transfer phase of a migrate operation for migrating asource Vserver of a source cluster to a destination cluster, thetransfer phase using asynchronous baseline transfer to transfer data andconfiguration of a plurality of source storage volumes configured in aCG for the migrate operation to a plurality of destination storagevolumes of a destination cluster, the asynchronous baseline transfer ismanaged as a group; updating (B540, FIG. 5C), by the processor, a stateof each of the plurality of source storage volumes to a sync stateindicating completion of a pre-commit phase of the migrate operation toinitiate a commit phase of the migrate operation; locking (B548, FIG.5D), by the processor, the source Vserver to prevent any configurationchanges for a certain duration during the commit phase, whilepersistently maintaining a state of the migrate operation at both thesource cluster and destination cluster; generating (B552, FIG. 5D), bythe processor, a snapshot of the plurality of destination storagevolumes for performing data integrity checks between data stored at thesource cluster and migrated data at destination cluster, aftercompleting the commit phase; transitioning (B550, FIG. 5D), by theprocessor, the migrate operation state to a point of no return state(PONR), upon completing the commit phase and initializing (b552, FIG.5D) the Vserver at the destination cluster for processing input/outputrequests; and retaining, by the processor, a snapshot of the sourceVserver and restarting the source Vserver, if the migrate operationfails.

The method further includes entering (610, FIG. 6 ), by the processor, apause state during the transfer phase of the migration operation; andaborting (618, FIG. 6 ), by the processor, the migrate operation fromthe pause state and deleting objects created for the migrate operation.The method further includes applying (B554, FIG. 5D), by the processor,a last configuration of the plurality of source volumes at thedestination cluster, after completing the commit phase. The methodfurther includes cancelling (B568, FIG. 5E), by the processor, a timerat the source cluster, in response to reaching the PONR state of themigrate operation, the timer used to track the certain duration for thecommit phase.

The method further includes updating (B552, FIG. 5D), by the processor,during the commit phase, configuration of the plurality of destinationstorage volumes for allowing read and write operations from thedestination cluster. The method further includes executing, by theprocessor, a migrate orchestrator thread (342, FIG. 4B) in a user space(440B, FIG. 4B) of an owning node of the destination cluster formanaging tasks associated with the migrate operation. The method furtherincludes executing, by the processor, a failure thread (406B, FIG. 4A)in a user space of an owning node of the destination cluster and in auser space of an owning node of the source cluster for managing failureconditions during the migrate operation.

Methods and systems for Vserver migration are provided. One methodincludes maintaining (FIG. 6 ), by the processor, a state of a migrateoperation for migrating a plurality of source storage volumes managed bya source Vserver of a source cluster to a plurality of destinationstorage volumes of a destination cluster of a networked storageenvironment; restarting (B721, FIG. 7E), by the processor, a process ata healthy node of the source cluster or the destination cluster tocontinue the migrate operation, in response to detecting an unhealthynode at the source cluster or the destination cluster executing theprocess; retrying (B733, FIG. 7E), by the processor, a task associatedwith the migrate operation experiencing intermittent failure for acertain number of times, and upon successful execution, continuing themigration operation; and checking (B729, FIG. 7E), by the processor, thestate of the migrate operation and in response to the state of themigrate operation, continuing the migrate operation or restarting themigration operation.

The method further includes determining (B703, FIG. 7E), by theprocessor, an inter-cluster failure between the source cluster and thedestination occurring while the migrate operation is at a point of noreturn (PONR); and restarting (B707, FIG. 7E), by the processor, thesource Vserver at the source cluster and the migrate operation. Themethod further includes undoing (B751, FIG. 7E), by the processor, anytasks executed during a setup phase of the migrate operation, inresponse to a failure condition occurring during the setup phase; andrestarting (B753, FIG. 7E), by the processor, the migrate operation.

The method further includes undoing (B716 and B718, FIG. 7F), by theprocessor, any tasks executed during a transfer phase and a setup phaseof the migrate operation, in response to a failure condition occurringduring the transfer phase; and restarting, by the processor, the migrateoperation. The method further includes undoing (B720, FIG. 7F), by theprocessor, any tasks executed during a cut-over pre-commit phase, atransfer phase and a setup phase of the migrate operation, in responseto a failure condition occurring during the cut-over pre-commit phase;and restarting, by the processor, the migrate operation. The methodfurther includes retrying, by the processor, the task associated withthe migrate operation, in response to a network error detected at thesource cluster, the destination cluster or both the source and thedestination cluster.

Operating System: FIG. 8 illustrates a generic example of storageoperating system 306 executed by node 208.1, according to one aspect ofthe present disclosure. The storage operating system 306 manages all thestorage volumes and conducts read and write operations.

In one example, storage operating system 306 may include severalmodules, or “layers” executed by one or both of network module 214 andstorage module 216. These layers include a file system manager 800 thatkeeps track of a directory structure (hierarchy) of the data stored instorage devices and manages read/write operations, i.e., executesread/write operations on storage in response to client 204.1/204.2requests.

The storage operating system 306 may also include a protocol layer 802and an associated network access layer 806, to allow node 208.1 tocommunicate over a network with other systems, such as clients204.1/204.2. Protocol layer 802 may implement one or more of varioushigher-level network protocols, such as NFS, CIFS, Hypertext TransferProtocol (HTTP), TCP/IP and others, as described below.

Network access layer 806 may include one or more drivers, whichimplement one or more lower-level protocols to communicate over thenetwork, such as Ethernet. Interactions between clients 204.1/204.2 andmass storage devices 212.1 are illustrated schematically as a path,which illustrates the flow of data through operating system 306.

The operating system 306 may also include a storage access layer 804 andan associated storage driver layer 808 to allow the storage module 216to communicate with a storage device. The storage access layer 804 mayimplement a higher-level storage protocol, such as RAID, while thestorage driver layer 808 may implement a lower-level storage deviceaccess protocol, such as FC or SCSI.

As used herein, the term “storage operating system” generally refers tothe computer-executable code operable on a computer to perform a storagefunction that manages data access and may, in the case of a node 208.1,implement data access semantics of a general-purpose operating system.The storage operating system can also be implemented as a microkernel,an application program operating over a general-purpose operatingsystem, such as UNIX® or Windows XP®, or as a general-purpose operatingsystem with configurable functionality, which is configured for storageapplications as described herein.

In addition, it will be understood to those skilled in the art that theinvention described herein may apply to any type of special-purpose(e.g., file server, filer or storage serving appliance) orgeneral-purpose computer, including a standalone computer or portionthereof, embodied as or including a storage system. Moreover, theteachings of this disclosure can be adapted to a variety of storagesystem architectures including, but not limited to, a network-attachedstorage environment, a storage area network and a storage devicedirectly attached to a client or host computer. The term “storagesystem” should therefore be taken broadly to include such arrangementsin addition to any subsystems configured to perform a storage functionand associated with other equipment or systems. It should be noted thatwhile this description is written in terms of a write any where filesystem, the teachings of the present invention may be utilized with anysuitable file system, including a write in place file system.

Processing System: FIG. 9 is a high-level block diagram showing anexample of the architecture of a processing system that may be usedaccording to one aspect. The processing system 900 can representmanagement system 132, client 104 or storage system 1120, for example.Note that certain standard and well-known components which are notgermane to the present invention are not shown in FIG. 9 .

The processing system 900 includes one or more processor(s) 902 andmemory 904, coupled to a bus system 905. The bus system 905 shown inFIG. 9 is an abstraction that represents any one or more separatephysical buses and/or point-to-point connections, connected byappropriate bridges, adapters and/or controllers. The bus system 905,therefore, may include, for example, a system bus, a PeripheralComponent Interconnect (PCI) bus, a HyperTransport or industry standardarchitecture (ISA) bus, a small computer system interface (SCSI) bus, auniversal serial bus (USB), or an Institute of Electrical andElectronics Engineers (IEEE) standard 1394 bus (sometimes referred to as“Firewire”).

The processor(s) 902 are the central processing units (CPUs) of theprocessing system 900 and, thus, control its overall operation. Incertain aspects, the processors 902 accomplish this by executingsoftware stored in memory 904. A processor 902 may be, or may include,one or more programmable general-purpose or special-purposemicroprocessors, digital signal processors (DSPs), programmablecontrollers, application specific integrated circuits (ASICs),programmable logic devices (PLDs), or the like, or a combination of suchdevices.

Memory 904 represents any form of random-access memory (RAM), read-onlymemory (ROM), flash memory, or the like, or a combination of suchdevices. Memory 904 includes the main memory of the processing system900. Software 906 which implements the process steps described abovewith respect to FIGS. 5A-5F, 6 and 7A-7F may reside in and execute (byprocessors 902) from memory 904.

Also connected to the processors 902 through the bus system 905 are oneor more internal mass storage devices 910, and a network adapter 912.Internal mass storage devices 910 may be or include any conventionalmedium for storing large volumes of data in a non-volatile manner, suchas one or more magnetic or optical based disks. The network adapter 912provides the processing system 900 with the ability to communicate withremote devices (e.g., storage servers 20) over a network and may be, forexample, an Ethernet adapter, a Fibre Channel adapter, or the like.

The processing system 900 also includes one or more input/output (I/O)devices 908 coupled to the bus system 905. The I/O devices 908 mayinclude, for example, a display device, a keyboard, a mouse, etc.

Thus, innovative technology for migrating a storage virtual machine havebeen described. Note that references throughout this specification to“one aspect” or “an aspect” means that a particular feature, structure,or characteristic described in connection with the aspect is included inat least one aspect of the present invention. Therefore, it isemphasized and should be appreciated that two or more references to “anaspect” or “one aspect” or “an alternative aspect” in various portionsof this specification are not necessarily all referring to the sameaspect. Furthermore, the features, structures or characteristics beingreferred to may be combined as suitable in one or more aspect s of theinvention, as will be recognized by those of ordinary skill in the art.

While the present disclosure is described above with respect to what iscurrently considered its preferred aspects, it is to be understood thatthe disclosure is not limited to that described above. To the contrary,the disclosure is intended to cover various modifications and equivalentarrangements within the spirit and scope of the appended claims.

What is claimed is:
 1. A method, comprising: generating, by a processor,a consistency group (CG) having a plurality of source storage volumesmanaged by a source storage virtual machine (Vserver) of a sourcecluster for a migrate operation to migrate the plurality of the sourcestorage volumes as a group to a plurality of destination storage volumesof a destination cluster; establishing, by the processor, a mirroringrelationship between the source cluster and the destination cluster formanaging asynchronous transfer of the plurality source storage volumesin the CG to the plurality of destination storage volumes during atransfer phase of the migrate operation; replicating, by the processor,a logical interface of the source cluster to the destination cluster,the logical interface providing a network address to access the sourcecluster; and automatically selecting, by the processor, a destinationport at the destination cluster, associated with the replicated logicalinterface.
 2. The method of claim 1, further comprising: determining, bythe processor, that the plurality of source volumes is configured with astorage space guarantee; and selecting, by the processor, the pluralityof destination storage volumes at the destination cluster with enoughstorage space to meet the space guarantee.
 3. The method of claim 1,further comprising: selecting, by the processor, the plurality ofdestination storage volumes from a list of aggregates identified by aclient system.
 4. The method of claim 1, further comprising: selecting,by the processor, the plurality of destination storage volumes based onthe available performance capacity of a destination aggregate at thedestination, the available performance capacity based on latencyassociated in processing input/output requests and storage utilizationof the destination aggregate.
 5. The method of claim 1, furthercomprising: updating, by the processor, the asynchronous transfer of theplurality of source storage volumes to the plurality of destinationstorage volumes from an asynchronous state to a synchronous state. 6.The method of claim 1, further comprising: confirming, by the processor,during a setup phase of the migrate operation, connectivity between thedestination port and a source port by transmitting a network packet fromthe source port to the destination port and receiving a response fromthe destination port, the source port used to access storage via thesource Vserver.
 7. The method of claim 1, further comprising: setting,by the processor, an auto-cutover option to automatically transition themigrate operation from the transfer phase to a cut-over phase when for acertain duration of the cut-over phase access to data by the sourcecluster is disabled.
 8. A non-transitory, machine readable storagemedium having stored thereon instructions comprising machine executablecode, which when executed by a machine, causes the machine to: generatea consistency group (CG) having a plurality of source storage volumesmanaged by a source storage virtual machine (Vserver) of a sourcecluster for a migrate operation to migrate the plurality of the sourcestorage volumes as a group to a plurality of destination storage volumesof a destination cluster; establish a mirroring relationship between thesource cluster and the destination cluster for managing asynchronoustransfer of the plurality source storage volumes in the CG to theplurality of destination storage volumes during a transfer phase of themigrate operation; replicate a logical interface of the source clusterto the destination cluster, the logical interface providing a networkaddress to access the source cluster; and automatically select adestination port at the destination cluster, associated with thereplicated logical interface.
 9. The non-transitory, machine readablestorage medium of claim 8, wherein the machine executable code furthercauses the machine to: determine that the plurality of source volumes isconfigured with a storage space guarantee; and select the plurality ofdestination storage volumes at the destination cluster with enoughstorage space to meet the space guarantee.
 10. The non-transitory,machine readable storage medium of claim 8, wherein the machineexecutable code further causes the machine to: select the plurality ofdestination storage volumes from a list of aggregates identified by aclient system.
 11. The non-transitory, machine readable storage mediumof claim 8, wherein the machine executable code further causes themachine to: select the plurality of destination storage volumes based onthe available performance capacity of a destination aggregate at thedestination, the available performance capacity based on latencyassociated in processing input/output requests and storage utilizationof the destination aggregate.
 12. The non-transitory, machine readablestorage medium of claim 8, wherein the machine executable code furthercauses the machine to: update the baseline transfer of the plurality ofsource storage volumes to the plurality of destination storage volumesfrom an asynchronous state to a synchronous state.
 13. Thenon-transitory, machine readable storage medium of claim 8, wherein themachine executable code further causes the machine to: confirm during asetup phase of the migrate operation, connectivity between thedestination port and a source port by transmitting a network packet fromthe source port to the destination port and receiving a response fromthe destination port, the source port used to access storage via thesource Vserver.
 14. The non-transitory, machine readable storage mediumof claim 8, wherein the machine executable code further causes themachine to: set an auto-cutover option to automatically transition themigrate operation from the transfer phase to a cut-over phase when for acertain duration of the cut-over phase access to data by the sourcecluster is disabled.
 15. A system, comprising: a memory containingmachine readable medium comprising machine executable code having storedthereon instructions; and a processor coupled to the memory to executethe machine executable code to: generate a consistency group (CG) havinga plurality of source storage volumes managed by a source storagevirtual machine (Vserver) of a source cluster for a migrate operation tomigrate the plurality of the source storage volumes as a group to aplurality of destination storage volumes of a destination cluster;establish a mirroring relationship between the source cluster and thedestination cluster for managing asynchronous transfer of the pluralitysource storage volumes in the CG to the plurality of destination storagevolumes during a transfer phase of the migrate operation; replicate alogical interface of the source cluster to the destination cluster, thelogical interface providing a network address to access the sourcecluster; and automatically select a destination port at the destinationcluster, associated with the replicated logical interface.
 16. Thesystem of claim 15, wherein the machine executable code further causesto: determine that the plurality of source volumes is configured with astorage space guarantee; and select the plurality of destination storagevolumes at the destination cluster with enough storage space to meet thespace guarantee.
 17. The system of claim 15, wherein the machineexecutable code further causes to: select the plurality of destinationstorage volumes from a list of aggregates identified by a client system.18. The system of claim 15, wherein the machine executable code furthercauses to: select the plurality of destination storage volumes based onthe available performance capacity of a destination aggregate at thedestination, the available performance capacity based on latencyassociated in processing input/output requests and storage utilizationof the destination aggregate.
 19. The system of claim 15, wherein themachine executable code further causes to: update the asynchronoustransfer of the plurality of source storage volumes to the plurality ofdestination storage volumes from an asynchronous state to a synchronousstate.
 20. The system of claim 15, wherein the machine executable codefurther causes to: set an auto-cutover option to automaticallytransition the migrate operation from the transfer phase to a cut-overphase when for a certain duration of the cut-over phase access to databy the source cluster is disabled.